[R] cut2 once, bin twice...

2009-10-22 Thread sdanzige

Hello,

I'm using the Hmisc cut2 function to bin a set of data.  It produces bins
that I like with results like this:

[96,270]:171
[69, 96): 54
[49, 69): 40
[35, 49): 28
[28, 35): 14
[24, 28):  8
(Other) : 48

I would like to take a second set of data, and assign it to bins based on
factors defined by my call to cut 2.

Does anyone know how I can do this?

Thank you,
-S
-- 
View this message in context: 
http://www.nabble.com/cut2-once%2C-bin-twice...-tp26020736p26020736.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] coxph() and survfit()

2009-10-22 Thread Mura Tamakou
Dear All,

I have a question regarding the output of survfit() when I supply a Cox model. 
Lets say for example:

library(survival)

fit <- coxph(Surv(time, status == 2) ~ factor(spiders), data = pbc)
fit # HR for spiders is significant


newdata <- data.frame(spiders = factor(0:1))
sf <- survfit(fit, newdata = newdata)
sum.sf <- summary(sfit, times = c(2000, 2500, 3000))

# survival estimates for the yes/no spiders
# and the 3 follow up times
sum.sf$surv

# corresponding lower limits of the
# 95% CI
sum.sf$low

# corresponding upper limits of the
# 95% CI
sum.sf$up


we observe that the 95% CIs overlap!! How is this possible since the HR for 
spiders is significant.

Regards,

Mura

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Functional data analysis - problems with smoothing

2009-10-22 Thread Peter Ehlers

The error message is pretty clear: regardless of what
*you* think, R says that 'isi' is not numeric.

Are you sure that 'isi' is not a *factor* object?
I'm willing to bet that it is.

Use str() to check your data.

 -Peter Ehlers

Benjamin Cheah wrote:

Hi all,

I'm having major issues with smoothing my functional data

I'm referring to Jim Ramsay's
examples in his books. The following error message keeps appearing,
despite all my data being numeric can anyone kindly offer any suggestions?

isi - vector of argument values - i.e. the independent variable of the curves
rlz - data array
TMSfdPar - functional data parameter.


TMSfdPar = fdPar(TMSbasis, 4, 0.01)
TMSfdsmooth = smooth.basis(isi, rlz, TMSfdPar)

Error in smooth.basis(isi, rlz, TMSfdPar) : 'argvals' is not numeric.

I don't understand why the error message keeps popping up. I've tried playing 
around with Jim Ramsay's datasets and I think my data is organised in a similar 
manner to his, so can't understand what's going on oh, the frustration!

Thanks in advance,

Ben
(the amateur R data analyst and statistician.)



  
__


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Functional data analysis - problems with smoothing

2009-10-22 Thread Benjamin Cheah
Hi all,

I'm having major issues with smoothing my functional data

I'm referring to Jim Ramsay's
examples in his books. The following error message keeps appearing,
despite all my data being numeric can anyone kindly offer any suggestions?

isi - vector of argument values - i.e. the independent variable of the curves
rlz - data array
TMSfdPar - functional data parameter.

> TMSfdPar = fdPar(TMSbasis, 4, 0.01)
> TMSfdsmooth = smooth.basis(isi, rlz, TMSfdPar)
Error in smooth.basis(isi, rlz, TMSfdPar) : 'argvals' is not numeric.

I don't understand why the error message keeps popping up. I've tried playing 
around with Jim Ramsay's datasets and I think my data is organised in a similar 
manner to his, so can't understand what's going on oh, the frustration!

Thanks in advance,

Ben
(the amateur R data analyst and statistician.)



  
__


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bayesian regression stepwise function?

2009-10-22 Thread Charles C. Berry

On Thu, 22 Oct 2009, Ben Bolker wrote:





Allan.Y wrote:


Hi everyone,

I am wondering if there exists a stepwise regression function for the
Bayesian regression model.  I tried googling, but I couldn't find
anything.  I know "step" function exists for regular stepwise regression,
but nothing for Bayes.



Why?  That seems so ... un-Bayesian ...


If 'fools rush in where angels fear to tread', then Bayesians 'jump' in 
where frequentists fear to 'step'...


Seriously, there are Bayesian regression approaches that priorize the 
model size (sometimes only implicitly by assigning a prior for the 
inclusion of each candidate regressors). Then they 'jump' between models 
of different sizes.


On CRAN, Package qtlbim (which is specialized to a particular genetics 
problem) implements one such, I think.


Package bqtl does not implement the jumping approach, but does explore a 
model space with differing numbers of regressors for the same (qtl) 
problem.


Perhaps the closest to a general purpose 'stepwise flavored' Bayesian 
regression is implemented in Package BMA, which IIRC borrows step() for 
some of its work.


But CRAN now has more packages than my cortex has neurons, so there are
probably more packages that do something like this. Try

RSiteSearch("jump regression", restric='functions')

and start reading.

HTH,

Chuck






--
View this message in context: 
http://www.nabble.com/Bayesian-regression-stepwise-function--tp26013725p26015081.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to make R packages?

2009-10-22 Thread Jorge Ivan Velez
Hi Peng,
Also, take a look at the following links:
http://robjhyndman.com/research/Rpackages_notes.pdf
http://www.biostat.wisc.edu/~kbroman/Rintro/Rwinpack.html

HTH,
Jorge


On Fri, Oct 23, 2009 at 12:10 AM, Peng Yu <> wrote:

> I found the following document on making R packages. But it is old.
> I'm wondering if there is more current ones and hopefully more
> complete ones.
>
> http://biosun1.harvard.edu/courses/individual/bio271/lectures/L6/Rpkg.pdf
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to make R packages?

2009-10-22 Thread Benilton Carvalho

Maybe

http://cran.r-project.org/doc/manuals/R-exts.pdf

?

b

On Oct 23, 2009, at 2:10 AM, Peng Yu wrote:


I found the following document on making R packages. But it is old.
I'm wondering if there is more current ones and hopefully more
complete ones.

http://biosun1.harvard.edu/courses/individual/bio271/lectures/L6/Rpkg.pdf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting a vector of strings...

2009-10-22 Thread andrew
the following works - double backslash to remove the "or"
functionality of | in a regex.  (Bill Dunlap showed that you don't
need sapply for it to work)

xs <- "this is | string"
xsv <- paste(xs, 1:10)
strsplit(xsv, "\\|")


On Oct 23, 3:50 pm, Jonathan Greenberg  wrote:
> William et al:
>
>     Thanks!  I think I have a somewhat more complicated issue due to the
> type of string I'm using -- the split is " | " (space pipe space) -- how
> do I code that based on your sub code below?  Using " | *" doesn't seem
> to be working.  Thanks!
>
> --j
>
>
>
> William Dunlap wrote:
> >> -Original Message-
> >> From: r-help-boun...@r-project.org
> >> [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg
> >> Sent: Thursday, October 22, 2009 7:35 PM
> >> To: r-help
> >> Subject: [R] splitting a vector of strings...
>
> >> Quick question -- if I have a vector of strings that I'd like
> >> to split
> >> into two new vectors based on a substring that is inside of
> >> each string,
> >> what is the most efficient way to do this?  The substring
> >> that I want to
> >> split on is multiple characters, if that matters, and it is
> >> contained in
> >> every element of the character vector.
>
> > strsplit and sub can both be used for this.  If you know
> > the string will be split into 2 parts then 2 calls to sub
> > with slightly different patterns will do it.  strsplit requires
> > less fiddling with the pattern and is handier when the number
> > of parts is variable or large.  strsplit's output often needs to
> > be rearranged for convenient use.
>
> > E.g., I made 100,000 strings with a 'qaz' in their middles with
> >   x<-paste("X",sample(1e5),sep="")
> >   y<-sub("X","Y",x)
> >   xy<-paste(x,y,sep="qaz")
> > and split them by the 'qaz' in two ways:
> >   system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy)))
> >   # user  system elapsed
> >   # 0.22    0.00    0.21
>
> > system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`,
> > 1)),y=unlist(lapply(tmp,`[`,2)))})
> >    user  system elapsed
> >   # 2.42    0.00    2.20
> >   identical(ret1,ret2)
> >   #[1] TRUE
> >   identical(ret1$x,x) && identical(ret1$y,y)
> >   #[1] TRUE
>
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
>
> >> --j
>
> >> --
>
> >> Jonathan A. Greenberg, PhD
> >> Postdoctoral Scholar
> >> Center for Spatial Technologies and Remote Sensing (CSTARS)
> >> University of California, Davis
> >> One Shields Avenue
> >> The Barn, Room 250N
> >> Davis, CA 95616
> >> Phone: 415-763-5476
> >> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307
>
> >> __
> >> r-h...@r-project.org mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >>http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> --
>
> Jonathan A. Greenberg, PhD
> Postdoctoral Scholar
> Center for Spatial Technologies and Remote Sensing (CSTARS)
> University of California, Davis
> One Shields Avenue
> The Barn, Room 250N
> Davis, CA 95616
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting a vector of strings...

2009-10-22 Thread Jonathan Greenberg

William et al:

   Thanks!  I think I have a somewhat more complicated issue due to the 
type of string I'm using -- the split is " | " (space pipe space) -- how 
do I code that based on your sub code below?  Using " | *" doesn't seem 
to be working.  Thanks!


--j

William Dunlap wrote:

-Original Message-
From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg

Sent: Thursday, October 22, 2009 7:35 PM
To: r-help
Subject: [R] splitting a vector of strings...

Quick question -- if I have a vector of strings that I'd like 
to split 
into two new vectors based on a substring that is inside of 
each string, 
what is the most efficient way to do this?  The substring 
that I want to 
split on is multiple characters, if that matters, and it is 
contained in 
every element of the character vector.



strsplit and sub can both be used for this.  If you know
the string will be split into 2 parts then 2 calls to sub
with slightly different patterns will do it.  strsplit requires
less fiddling with the pattern and is handier when the number
of parts is variable or large.  strsplit's output often needs to
be rearranged for convenient use.

E.g., I made 100,000 strings with a 'qaz' in their middles with
  x<-paste("X",sample(1e5),sep="")
  y<-sub("X","Y",x)
  xy<-paste(x,y,sep="qaz")
and split them by the 'qaz' in two ways:
  system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy)))
  # user  system elapsed 
  # 0.220.000.21 
 
system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`,

1)),y=unlist(lapply(tmp,`[`,2)))})
   user  system elapsed 
  # 2.420.002.20 
  identical(ret1,ret2)

  #[1] TRUE
  identical(ret1$x,x) && identical(ret1$y,y)
  #[1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

  

--j

--

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] factor analysis: extract principal components until the eigen value <1

2009-10-22 Thread Tiger Guo
Hello,



I am trying to do factor analysis using principle components analysis.

I use “fa” in package “psych”. I would like extract principal components
until the eigen value <1. Two questions:



First, can I calculate the eigenvalue of the covariance matrix and decide
the number of eigenvalues>1 and use the number as the number of factors?
Compared to the SPSS, it looks that the number obtained by this way is much
more.



Second, can some functions provided by R do this job?



Thanks.



Gencheng Guo

MCCL Lab, ECE,

University of Alberta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to make R packages?

2009-10-22 Thread Peng Yu
I found the following document on making R packages. But it is old.
I'm wondering if there is more current ones and hopefully more
complete ones.

http://biosun1.harvard.edu/courses/individual/bio271/lectures/L6/Rpkg.pdf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] a problem about integrate function in R .thank you .

2009-10-22 Thread fuzuo xie
e2 <- function(x) {
out <- 0*x
for(i in 1:length(x))
out[i] <-integrate(function(y) qnorm(y),lower=0,upper=x[i])$value
out }
integrate(e2,lower=0, upper=a)$value


above is my code , when a is small , say a<0.45  the result is right .
however , when a>0.5
the result is incorrect .  why ?thank you .

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting a vector of strings...

2009-10-22 Thread andrew
xs <- "this is string"
xsv <- paste(xs, 1:10)
sapply(xsv, function(x) strsplit(x, '\\sis\\s'))

This will split the vector of string "xsv" on the word 'is' that has a
space immediately before and after it.



On Oct 23, 1:34 pm, Jonathan Greenberg  wrote:
> Quick question -- if I have a vector of strings that I'd like to split
> into two new vectors based on a substring that is inside of each string,
> what is the most efficient way to do this?  The substring that I want to
> split on is multiple characters, if that matters, and it is contained in
> every element of the character vector.
>
> --j
>
> --
>
> Jonathan A. Greenberg, PhD
> Postdoctoral Scholar
> Center for Spatial Technologies and Remote Sensing (CSTARS)
> University of California, Davis
> One Shields Avenue
> The Barn, Room 250N
> Davis, CA 95616
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] splitting a vector of strings...

2009-10-22 Thread William Dunlap
> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg
> Sent: Thursday, October 22, 2009 7:35 PM
> To: r-help
> Subject: [R] splitting a vector of strings...
> 
> Quick question -- if I have a vector of strings that I'd like 
> to split 
> into two new vectors based on a substring that is inside of 
> each string, 
> what is the most efficient way to do this?  The substring 
> that I want to 
> split on is multiple characters, if that matters, and it is 
> contained in 
> every element of the character vector.

strsplit and sub can both be used for this.  If you know
the string will be split into 2 parts then 2 calls to sub
with slightly different patterns will do it.  strsplit requires
less fiddling with the pattern and is handier when the number
of parts is variable or large.  strsplit's output often needs to
be rearranged for convenient use.

E.g., I made 100,000 strings with a 'qaz' in their middles with
  x<-paste("X",sample(1e5),sep="")
  y<-sub("X","Y",x)
  xy<-paste(x,y,sep="qaz")
and split them by the 'qaz' in two ways:
  system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy)))
  # user  system elapsed 
  # 0.220.000.21 
 
system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`,
1)),y=unlist(lapply(tmp,`[`,2)))})
   user  system elapsed 
  # 2.420.002.20 
  identical(ret1,ret2)
  #[1] TRUE
  identical(ret1$x,x) && identical(ret1$y,y)
  #[1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> --j
> 
> -- 
> 
> Jonathan A. Greenberg, PhD
> Postdoctoral Scholar
> Center for Spatial Technologies and Remote Sensing (CSTARS)
> University of California, Davis
> One Shields Avenue
> The Barn, Room 250N
> Davis, CA 95616
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm.fit to use LAPACK instead of LINPACK

2009-10-22 Thread Ted

On Thu, 22 Oct 2009, Douglas Bates wrote:


On Thu, Oct 22, 2009 at 10:26 AM, Ravi Varadhan  wrote:

Ted,



LAPACK is newer and is supposed to contain better algorithms than LINPACK.  It 
is not true that LAPACK cannot handle rank-deficient problems.  It can.


It's not just a question of handling rank-deficiency.  It's the
particular form of pivoting that is used so that columns associated
with the same term stay adjacent.

The code that is actually used in glm.fit and lm.fit, called through
the Fortran subroutine dqrls, is a modified version of the Linpack
dqrdc subroutine.


However, I do not know if using LAPACK in glm.fit instead of LINPACK would be 
faster and/or more memory efficient.


The big thing that could be gained is the use of level-3 BLAS.  The
current code uses only level-1 BLAS.  Accelerated BLAS can take
advantage of level 3 calls relative to level-1.



How would I change to level-3 ?  Would I need to rebuild R with some 
flags?  I imagine some comparative benchmarks.




Even so, I doubt that the QR decomposition is the big time sink in
glm.fit.  Why not profile a representative fit and check?  I did
profile the glm.fit code a couple of years ago and discovered that a
lot of time was being spent evaluating the various family functions
like the inverse link and the variance function and that was because
of calls to pmin and pmax.


What kind of profiling software should I use?  Is the the Rprof in R able 
to report which part of glm.fit is the bottleneck?




Before trying to change very tricky Fortran code you owe it to
yourself to check that the potential gains would be.


Thanks for the suggestions.




- Original Message -
From: Ted 
Date: Thursday, October 22, 2009 10:53 am
Subject: Re: [R] glm.fit to use LAPACK instead of LINPACK
To: "r-help@R-project.org" 



Hi,

I understand that the glm.fit calls LINPACK fortran routines instead of
LAPACK because it can handle the 'rank deficiency problem'.  If my data
matrix is not rank deficient, would a glm.fit function which runs on
LAPACK be faster?  Would this be worthwhile to convert glm.fit to use
LAPACK?  Has anyone done this already??  What is the best way to do this?

I'm looking at very large datasets (thousands of glm calls), and would
like to know if it's worth the effort for performance issues.

Thanks,

Ted

-
Ted Chiang
  Bioinformatics Analyst
  Centre for Computational Biology
  Hospital for Sick Children, Toronto
  416.813.7028
  tchi...@sickkids.ca

__
R-help@r-project.org mailing list

PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] splitting a vector of strings...

2009-10-22 Thread Jonathan Greenberg
Quick question -- if I have a vector of strings that I'd like to split 
into two new vectors based on a substring that is inside of each string, 
what is the most efficient way to do this?  The substring that I want to 
split on is multiple characters, if that matters, and it is contained in 
every element of the character vector.


--j

--

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] integrate() function error

2009-10-22 Thread andrew
Change e2 to the following and it works

e2 <- function(a) a^2/2

The reason it doesn't is that e2 must be able to handle vector inputs
correctly.   Your original function does not do this.

from ?integrate

f - an R function taking a numeric first argument and returning a
numeric vector of the same length. Returning a non-finite element will
generate an error.

You could change it to the following

e2 <- function(x) { out <- 0*x; for(n in 1:length(x)) out[n] <-
integrate(function(y) y,lower=0,upper=x[n])$value; out }

and it seems to give the correct answer for me.




On Oct 23, 1:13 pm, fuzuo xie  wrote:
> This is my code , when i run it ,error happed . can you tell me what's the
> reason and modify it ?thank you very much !! the error is "evaluation of
> function gave a result of wrong length"
>
> e2<-function(a) integrate(function(x) x,lower=0,upper=a)$value
> integrate(e2,lower=0, upper=0.5)$value
>
>         [[alternative HTML version deleted]]
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] integrate() function error

2009-10-22 Thread fuzuo xie
This is my code , when i run it ,error happed . can you tell me what's the
reason and modify it ?thank you very much !! the error is "evaluation of
function gave a result of wrong length"


e2<-function(a) integrate(function(x) x,lower=0,upper=a)$value
integrate(e2,lower=0, upper=0.5)$value

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inspect s4 methods

2009-10-22 Thread Douglas Bates
On Thu, Oct 22, 2009 at 8:11 PM, Ista Zahn  wrote:
> Hi everyone,
> I'm sure this has been asked before, but I could not find the right
> search terms to find it.
>
> If I want to see what summarizing a model fit with lm() does, I can
> write "summary.lm". But what if I want to see what summarizing a model
> fit with lmer() (lme4 package) does?
>
> showMethods(summary)
> Function: summary (package base)
> object="ANY"
> object="character"
>    (inherited from: object="ANY")
> object="mer"
> object="sparseMatrix"
> object="summary.mer"
>
> tells me there is a "mer" method, but  "summary.mer" gives me no joy.
> How can I see what summary.mer does? Do I have to inspect the source
> code?

Actually if you had followed up with the documentation for showMethods
you would have found out.

showMethods(summary, class = "mer", include = TRUE)

More details are given in Uwe's article in issue 4 of 2006 R News
http://cran.r-project.org/doc/Rnews/Rnews_2006-4.pdf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm.fit to use LAPACK instead of LINPACK

2009-10-22 Thread Douglas Bates
On Thu, Oct 22, 2009 at 10:26 AM, Ravi Varadhan  wrote:
> Ted,

> LAPACK is newer and is supposed to contain better algorithms than LINPACK.  
> It is not true that LAPACK cannot handle rank-deficient problems.  It can.

It's not just a question of handling rank-deficiency.  It's the
particular form of pivoting that is used so that columns associated
with the same term stay adjacent.

The code that is actually used in glm.fit and lm.fit, called through
the Fortran subroutine dqrls, is a modified version of the Linpack
dqrdc subroutine.

> However, I do not know if using LAPACK in glm.fit instead of LINPACK would be 
> faster and/or more memory efficient.

The big thing that could be gained is the use of level-3 BLAS.  The
current code uses only level-1 BLAS.  Accelerated BLAS can take
advantage of level 3 calls relative to level-1.

Even so, I doubt that the QR decomposition is the big time sink in
glm.fit.  Why not profile a representative fit and check?  I did
profile the glm.fit code a couple of years ago and discovered that a
lot of time was being spent evaluating the various family functions
like the inverse link and the variance function and that was because
of calls to pmin and pmax.

Before trying to change very tricky Fortran code you owe it to
yourself to check that the potential gains would be.

> - Original Message -
> From: Ted 
> Date: Thursday, October 22, 2009 10:53 am
> Subject: Re: [R] glm.fit to use LAPACK instead of LINPACK
> To: "r-help@R-project.org" 
>
>
>> Hi,
>>
>> I understand that the glm.fit calls LINPACK fortran routines instead of
>> LAPACK because it can handle the 'rank deficiency problem'.  If my data
>> matrix is not rank deficient, would a glm.fit function which runs on
>> LAPACK be faster?  Would this be worthwhile to convert glm.fit to use
>> LAPACK?  Has anyone done this already??  What is the best way to do this?
>>
>> I'm looking at very large datasets (thousands of glm calls), and would
>> like to know if it's worth the effort for performance issues.
>>
>> Thanks,
>>
>> Ted
>>
>> -
>> Ted Chiang
>>   Bioinformatics Analyst
>>   Centre for Computational Biology
>>   Hospital for Sick Children, Toronto
>>   416.813.7028
>>   tchi...@sickkids.ca
>>
>> __
>> R-help@r-project.org mailing list
>>
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] inspect s4 methods

2009-10-22 Thread Ista Zahn
Hi everyone,
I'm sure this has been asked before, but I could not find the right
search terms to find it.

If I want to see what summarizing a model fit with lm() does, I can
write "summary.lm". But what if I want to see what summarizing a model
fit with lmer() (lme4 package) does?

showMethods(summary)
Function: summary (package base)
object="ANY"
object="character"
(inherited from: object="ANY")
object="mer"
object="sparseMatrix"
object="summary.mer"

tells me there is a "mer" method, but  "summary.mer" gives me no joy.
How can I see what summary.mer does? Do I have to inspect the source
code?

-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDF too large, PNG bad quality

2009-10-22 Thread Lasse Kliemann
* Message by -Greg Snow- from Thu 2009-10-22:
 
> If you want to go the pdf route, then you need to find some way 
> to reduce redundant information while still getting the main 
> points of the plot.  With so many point, I would suggest 
> looking at the hexbin package (bioconductor I think) as one 
> approach, it will not be an identical scatterplot, but will 
> convey the information (possibly better) with much smaller 
> graphics file sizes.  There are other tools like sunflower 
> plots or others, but hexbin has worked well for me.
 
I took a look at the 'hexbin' package, and it is really 
interesting. You were right that it also helps to better display 
the data. Finally, this forced me to learn using the 'grid' 
package :-) I think I will use a pretty high number of bins, so 
the plot looks similar to the scatter plots I am used to -- with 
the addition of colors giving different densities.

> If you want to go the png route, the problem usually comes from 
> scaling the plot after producing it.  So, the solution is to 
> create the plot at the exact size and at the exact resolution 
> that you want to use it at in your document so that no scaling 
> needs to be done.  Use the png function, but don't accept the 
> defaults, choose the size and resolution.  If you later decide 
> on a different size of graph, recreate the file, don't let 
> LaTeX rescale the first one.

This was my strategy so far. For instance, for a figure that is 
to span the whole text block from left to right:

two_third_a4 <- 8.3 * 2/3
png("new.png",
width=two_third_a4,
height=two_third_a4,
units="in",
res=300)
plot(...)

Earlier I wrote that the PNG looks good when displayed 
separately, but looks inferior when embedded in the LaTeX PDF 
document. However, I now believe that the dependence is more on 
the viewer application. It looks good displayed separately with 
'qiv', but not with 'feh'. The PDF document looks inferior when 
displayed with 'evince' or 'epdfview', but it looks okay when 
displayed with 'xpdf'. I presume now that this phenomenon it not 
directly R-related.

I thank you and everyone who responded so quickly.

Lasse


pgpA2cmkH4qfO.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] removing random effect from nlme or using varPower() in nls

2009-10-22 Thread Kingsford Jones
help(gnls, pack=nlme)


hth,
Kingsford Jones

On Thu, Oct 22, 2009 at 4:36 PM, Michael A. Gilchrist  wrote:
> Hello,
>
> I've been fitting a random effects model using nlme to some data, but I am
> discovering that the variation in my random effect is very small.  As a
> result, I would like to replace it as a fixed effect (i.e. essentially fit
> the same model but with no random effect).
>
> As I understand it I could do this using nls(), but I'm using a number of
> options such as weights = varPower() which I am at a loss on how to
> implement in that framework.
>
> Is there a way to use nlme but with out a random effect? (a bit absurd, I
> realize, but I have the syntax working...)
>
> Alternatively, is there a way to use "weights = varPower()" with nls?
>
> Any help would be appreciated.
>
> Sincerely,
>
> Mike
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recode issue

2009-10-22 Thread Ista Zahn
Hi Nikhil,
The problem is that your initial "as.factor(0)" causes x to have
values of  "1" : "20" instead of 1 : 20. There are two possible
solutions:

1)
> library(car)
> x <- 1:20
> y <- as.factor(recode(x, " 1:5='A'; 6:10='B'; 11:15='C'; 16:20='D' "))
> y
 [1] A A A A A B B B B B C C C C C D D D D D
Levels: A B C D

2)
> x <- as.factor(1:20)
> y <- recode(x, " c('1','2','3','4','5')='A'; c('6','7','8','9','10')='B'; 
> c('11','12','13','14','15')='C'; c('16','17','18','19','20')='D' ")
> y
 [1] A A A A A B B B B B C C C C C D D D D D
Levels: A B C D

Solution 1 requires a lot less typing...

-Ista


On Thu, Oct 22, 2009 at 6:43 PM, Nikhil Kaza  wrote:
> I am having trouble with the recode function  that is provided in the CAR
> package. I trying to create a new factors based on existing factors.
>
> E.g.
>
>>x <- as.factor(1:20)
>>y <- recode(x, " 1:5='A'; 6:10='B'; 11:15='C'; 16:20='D' ")
>>y
>
> [1] A A A A A 6 7 8 9 A A A A A A A A A A A
> Levels: 6 7 8 9 A
>
> Could someone point to me, my error? Thanks
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Recode issue

2009-10-22 Thread Nikhil Kaza
I am having trouble with the recode function  that is provided in the  
CAR package. I trying to create a new factors based on existing factors.


E.g.

>x <- as.factor(1:20)
>y <- recode(x, " 1:5='A'; 6:10='B'; 11:15='C'; 16:20='D' ")
>y

[1] A A A A A 6 7 8 9 A A A A A A A A A A A
Levels: 6 7 8 9 A

Could someone point to me, my error? Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Linear Regression Model with GARCH errors

2009-10-22 Thread Ioana D

Hello

I was wondering if there is any possibility of estimating a linear
regression model with GARCH errors in R? I tried using fGarch library, but I
could only find for instance how to estimate arma-garch models or similar
combinations.

Thank you,

Ioana
-- 
View this message in context: 
http://www.nabble.com/Linear-Regression-Model-with-GARCH-errors-tp26017802p26017802.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] removing random effect from nlme or using varPower() in nls

2009-10-22 Thread Michael A. Gilchrist

Hello,

I've been fitting a random effects model using nlme to some data, but I am 
discovering that the variation in my random effect is very small.  As a result, 
I would like to replace it as a fixed effect (i.e. essentially fit the same 
model but with no random effect).


As I understand it I could do this using nls(), but I'm using a number of 
options such as weights = varPower() which I am at a loss on how to implement 
in that framework.


Is there a way to use nlme but with out a random effect? (a bit absurd, I 
realize, but I have the syntax working...)


Alternatively, is there a way to use "weights = varPower()" with nls?

Any help would be appreciated.

Sincerely,

Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help sub setting data frame

2009-10-22 Thread Sean MacEachern
Works perfectly!

Thanks to all who responded.

Sean

On Thu, Oct 22, 2009 at 6:24 PM, Ista Zahn  wrote:
> Is this what you want?
>
> df = data.frame('id'=c(1:100),'res'=c(1001:1100))
> dfb=df[1:10,]
> dfc = df[df$id %in% dfb$id,]
>
> Still not sure, but that's my best guess. Going back to your original
> data you can try
>
>  dfb = chkPd[chkPd$PN %in% df$PN,]
>
> Hope it helps,
> Ista
>
> On Thu, Oct 22, 2009 at 6:10 PM, Sean MacEachern  
> wrote:
>> Hi Ista,
>>
>> I think I'm suffering long dayitis myself. You are probably right. I
>> don't use subset that often. I typically use brackets to subset
>> dataframes. Essentially what I am trying to do is take my original
>> dataframe (chkPd) and subset it using a smaller dataframe with some
>> matching PN IDs. They are only a few hundred rows different in size so
>> subset wouldn't be appropriate here. I'm just struggling to figure out
>> what's going wrong in my first example.
>> for instance if I try:
>>> df = data.frame('id'=c(1,2,3,4),'res'=c(10,10,20,20))
>>> dfb=df[1:2]
>>> dfc = df[dfb$id,]
>>
>> I get something along the lines of what I'd expect where my new
>> dataframe is a subset of the original based on the matching ids I
>> specified in dfb$id. Is that wrong in my first example?
>>
>> Cheers,
>>
>> Sean
>>
>> On Thu, Oct 22, 2009 at 4:55 PM, Ista Zahn  wrote:
>>> Hi Sean,
>>> Comment in line below.
>>>
>>> On Thu, Oct 22, 2009 at 5:39 PM, Sean MacEachern  
>>> wrote:
 Hi,

 I'm running into a problem subsetting a data frame that I have never
 encountered before:

> dim(chkPd)
 [1] 3213    6

> df = head(chkPd)
> df
               PN        WB      Sire     Dam   MG SEX
 601      1001  715349   61710   61702   67    F
 969  1001_1  511092 616253 615037 168    F
 986  1002_1  511082 616253 623905 168    F
 667      1003  715617   61817   61441   67    F
 1361 1003_1 510711 635246 627321 168    F
 754       1004 715272   62356   61380  67     F


> dfb = chkPd[df$PN,]
> dfb
            PN     WB   Sire    Dam  MG  SEX
 1001    2114_1 510944 616294 614865 168    M
 NA             NA        NA 
 NA.1           NA        NA 
 1003    1130_1 510950 616294 619694 168    F
 NA.2           NA        NA 
 1004 2221-SHR2 510952 616294 619694 168    M


 I'm not sure why I'm getting this behaviour? By sub-setting the
 original data frame by PN I seem to be pulling out row numbers?
 Therefore I am only getting results where PN is less than the
 dimensions of the original data frame and of course nothing where PN
 has _ in the id. I have also tried using subset but haven't had any
 luck with that either.
>>>
>>> That is the documented behavior as far as I can tell. See
>>>
>>> ?"[.data.frame"
>>>
>>> Maybe my brain is going soft at the end of a long day, but I can't
>>> tell what you're trying to do. Can you clarify?
>>>
>>> -Ista
>>>


>dfb = subset(chkPd, PN==df$PN)
 Warning message:
 In PN == df$PN :
  longer object length is not a multiple of shorter object length

 I wasn't aware that both the larger data frame had to be a multiple of
 the object you were sub-setting . In any case I would appreciate any
 insight into what I may be doing wrong.

 Cheers,

 Sean


> sessionInfo()
 R version 2.9.1 (2009-06-26)
 i386-apple-darwin8.11.1

 locale:
 en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] splines   stats     graphics  grDevices utils     datasets  methods   
 base

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

>>>
>>>
>>>
>>> --
>>> Ista Zahn
>>> Graduate student
>>> University of Rochester
>>> Department of Clinical and Social Psychology
>>> http://yourpsyche.org
>>>
>>
>
>
>
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dgTMatrix --- [, , drop=F] strange behavior, Matrix 0.999375-20

2009-10-22 Thread Jose Quesada
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I have something strange here...
I want to subset a large sparse matrix, with the subset being still in
sparse form. Easily achievable with mm[i,,drop=F] , right? Well, it
doesn't work on the matrix I'm working on.

This is a very large wikipedia Matrix (now I can play with it as I just
got 16Gb of memory):

> m...@dim
[1]  793251 1027355
> nnzero(mm)
[1] 205746204
> class(mm)
[1] "dgTMatrix"
attr(,"package")
[1] "Matrix"

Imagine I want just the first row:

> a = mm[1,,drop=F]
> length(a)
[1] 1027355
> nnzero(a)
[1] 1291
>class(a)
"numeric"

However, _a_ is just a numeric (dense) vector (!). This takes a long
time and a big bite off my memory.
This is NOT what I get when I play with a toy example. In this case it
works as expected:

> (m <- Matrix(c(0,0,2:0), 3,5))
3 x 5 sparse Matrix of class "dgCMatrix"

[1,] . 1 . . 2
[2,] . . 2 . 1
[3,] 2 . 1 . .
> a = m[1,,drop=F]
> a
1 x 5 sparse Matrix of class "dgCMatrix"

[1,] . 1 . . 2

I wonder if it's a matter of size:
> M <- kronecker(m, diag(1))
> a = m[1,,drop=F]
> class(a)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"

Looks like it's working as expected on a sizeable example. What could be
going on? What's the difference between mm and M, (other than size?).
I'm not sure this is enough info to reproduce the problem (i.e., the
example matrices I posted work, mine doesn't). The problematic matrix is
.5gb compressed as .Rdata, so it's not easy to send it around.

This is  0.999375-20, not the latest version as of today: 0.999375-29;
however it's the only one available in the revolution 64 repositories
(win64).

Any idea what to try next? I cannot afford to operate with dense
representations.

Best,
- -Jose
- --
=== I'm learning a new keyboard layout. I type slowly. Please excuse
typos and slower-that-usual responses. Thanks for your patience===
Jose Quesada, PhD.

Max Planck Institute,
Center for Adaptive Behavior and Cognition -ABC-,
Lentzeallee 94, office 224, 14195 Berlin

http://www.josequesada.name/
http://twitter.com/Quesada

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)

iQIcBAEBAgAGBQJK4NwwAAoJEGMobUVGH+HKqccQAIiT+7642HI56OWvfN9OqriK
5QsimU8tyaHwYn6Xlb6wx1Da2uBKpMBz9AfSHOO12xIr5Irm3DtHKPTjkGS3QV9P
kYb1zUA3b5iZrFOj6hgChs4m6xx1MmsAwpG7Ft1oLXKY9/SF+Yqt4OoUHqRsNaI1
m+SpjZLEu8QKfq0F7FGVCJ8sGPMJ8E7nOq7qcRHQ3eIOcUoKZ0bFZrixDxD9oH02
enX4cqcHGJFiS27e70IIBoh4Q8mG96Z9LLm7/EbFdJMyWyzJQDVcBL+o/UUYjmTi
AxMUQjfFf2/2nDz6mFJ3raU0AS6vLuliq1/B2K52znJxqa+GT9pNc7l79OTVa1oc
F6BUBuAPj+1UQX1xO4lG6yjNoN1KZKmyoE/5tWnvWnp9tSxmszlty7f86/fpx3Mz
44dvLOcuucTUTLOdvkT9rSqWXS9M3GdS+kH5qD5RVd4ONw54iv4K3kxGfmmyb+6K
b19DbP7sVMe8qgybYLAyuPmeOskWbcRh/glkRrXV642NoiQVtu1UcIiGoNPxpAF4
Qu5bHIRCvkayUVbxFuqNo3xHEaTdvTX5IqzlseVpDYOKkRv3SJeiLGfZgnUvH0wG
zL5BIYBPXhtQjysL69VSyWs4mASUwZ0N5WhhWWTFVSnG+p1QS+CNURWwXYVTpvos
9kEvQ45lcQEKJ1bNuPBm
=XUeB
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help sub setting data frame

2009-10-22 Thread Ista Zahn
Is this what you want?

df = data.frame('id'=c(1:100),'res'=c(1001:1100))
dfb=df[1:10,]
dfc = df[df$id %in% dfb$id,]

Still not sure, but that's my best guess. Going back to your original
data you can try

 dfb = chkPd[chkPd$PN %in% df$PN,]

Hope it helps,
Ista

On Thu, Oct 22, 2009 at 6:10 PM, Sean MacEachern  wrote:
> Hi Ista,
>
> I think I'm suffering long dayitis myself. You are probably right. I
> don't use subset that often. I typically use brackets to subset
> dataframes. Essentially what I am trying to do is take my original
> dataframe (chkPd) and subset it using a smaller dataframe with some
> matching PN IDs. They are only a few hundred rows different in size so
> subset wouldn't be appropriate here. I'm just struggling to figure out
> what's going wrong in my first example.
> for instance if I try:
>> df = data.frame('id'=c(1,2,3,4),'res'=c(10,10,20,20))
>> dfb=df[1:2]
>> dfc = df[dfb$id,]
>
> I get something along the lines of what I'd expect where my new
> dataframe is a subset of the original based on the matching ids I
> specified in dfb$id. Is that wrong in my first example?
>
> Cheers,
>
> Sean
>
> On Thu, Oct 22, 2009 at 4:55 PM, Ista Zahn  wrote:
>> Hi Sean,
>> Comment in line below.
>>
>> On Thu, Oct 22, 2009 at 5:39 PM, Sean MacEachern  
>> wrote:
>>> Hi,
>>>
>>> I'm running into a problem subsetting a data frame that I have never
>>> encountered before:
>>>
 dim(chkPd)
>>> [1] 3213    6
>>>
 df = head(chkPd)
 df
>>>               PN        WB      Sire     Dam   MG SEX
>>> 601      1001  715349   61710   61702   67    F
>>> 969  1001_1  511092 616253 615037 168    F
>>> 986  1002_1  511082 616253 623905 168    F
>>> 667      1003  715617   61817   61441   67    F
>>> 1361 1003_1 510711 635246 627321 168    F
>>> 754       1004 715272   62356   61380  67     F
>>>
>>>
 dfb = chkPd[df$PN,]
 dfb
>>>            PN     WB   Sire    Dam  MG  SEX
>>> 1001    2114_1 510944 616294 614865 168    M
>>> NA             NA        NA 
>>> NA.1           NA        NA 
>>> 1003    1130_1 510950 616294 619694 168    F
>>> NA.2           NA        NA 
>>> 1004 2221-SHR2 510952 616294 619694 168    M
>>>
>>>
>>> I'm not sure why I'm getting this behaviour? By sub-setting the
>>> original data frame by PN I seem to be pulling out row numbers?
>>> Therefore I am only getting results where PN is less than the
>>> dimensions of the original data frame and of course nothing where PN
>>> has _ in the id. I have also tried using subset but haven't had any
>>> luck with that either.
>>
>> That is the documented behavior as far as I can tell. See
>>
>> ?"[.data.frame"
>>
>> Maybe my brain is going soft at the end of a long day, but I can't
>> tell what you're trying to do. Can you clarify?
>>
>> -Ista
>>
>>>
>>>
dfb = subset(chkPd, PN==df$PN)
>>> Warning message:
>>> In PN == df$PN :
>>>  longer object length is not a multiple of shorter object length
>>>
>>> I wasn't aware that both the larger data frame had to be a multiple of
>>> the object you were sub-setting . In any case I would appreciate any
>>> insight into what I may be doing wrong.
>>>
>>> Cheers,
>>>
>>> Sean
>>>
>>>
 sessionInfo()
>>> R version 2.9.1 (2009-06-26)
>>> i386-apple-darwin8.11.1
>>>
>>> locale:
>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] splines   stats     graphics  grDevices utils     datasets  methods   
>>> base
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Ista Zahn
>> Graduate student
>> University of Rochester
>> Department of Clinical and Social Psychology
>> http://yourpsyche.org
>>
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help sub setting data frame

2009-10-22 Thread Sean MacEachern
Hi Ista,

I think I'm suffering long dayitis myself. You are probably right. I
don't use subset that often. I typically use brackets to subset
dataframes. Essentially what I am trying to do is take my original
dataframe (chkPd) and subset it using a smaller dataframe with some
matching PN IDs. They are only a few hundred rows different in size so
subset wouldn't be appropriate here. I'm just struggling to figure out
what's going wrong in my first example.
for instance if I try:
> df = data.frame('id'=c(1,2,3,4),'res'=c(10,10,20,20))
> dfb=df[1:2]
> dfc = df[dfb$id,]

I get something along the lines of what I'd expect where my new
dataframe is a subset of the original based on the matching ids I
specified in dfb$id. Is that wrong in my first example?

Cheers,

Sean

On Thu, Oct 22, 2009 at 4:55 PM, Ista Zahn  wrote:
> Hi Sean,
> Comment in line below.
>
> On Thu, Oct 22, 2009 at 5:39 PM, Sean MacEachern  
> wrote:
>> Hi,
>>
>> I'm running into a problem subsetting a data frame that I have never
>> encountered before:
>>
>>> dim(chkPd)
>> [1] 3213    6
>>
>>> df = head(chkPd)
>>> df
>>               PN        WB      Sire     Dam   MG SEX
>> 601      1001  715349   61710   61702   67    F
>> 969  1001_1  511092 616253 615037 168    F
>> 986  1002_1  511082 616253 623905 168    F
>> 667      1003  715617   61817   61441   67    F
>> 1361 1003_1 510711 635246 627321 168    F
>> 754       1004 715272   62356   61380  67     F
>>
>>
>>> dfb = chkPd[df$PN,]
>>> dfb
>>            PN     WB   Sire    Dam  MG  SEX
>> 1001    2114_1 510944 616294 614865 168    M
>> NA             NA        NA 
>> NA.1           NA        NA 
>> 1003    1130_1 510950 616294 619694 168    F
>> NA.2           NA        NA 
>> 1004 2221-SHR2 510952 616294 619694 168    M
>>
>>
>> I'm not sure why I'm getting this behaviour? By sub-setting the
>> original data frame by PN I seem to be pulling out row numbers?
>> Therefore I am only getting results where PN is less than the
>> dimensions of the original data frame and of course nothing where PN
>> has _ in the id. I have also tried using subset but haven't had any
>> luck with that either.
>
> That is the documented behavior as far as I can tell. See
>
> ?"[.data.frame"
>
> Maybe my brain is going soft at the end of a long day, but I can't
> tell what you're trying to do. Can you clarify?
>
> -Ista
>
>>
>>
>>>dfb = subset(chkPd, PN==df$PN)
>> Warning message:
>> In PN == df$PN :
>>  longer object length is not a multiple of shorter object length
>>
>> I wasn't aware that both the larger data frame had to be a multiple of
>> the object you were sub-setting . In any case I would appreciate any
>> insight into what I may be doing wrong.
>>
>> Cheers,
>>
>> Sean
>>
>>
>>> sessionInfo()
>> R version 2.9.1 (2009-06-26)
>> i386-apple-darwin8.11.1
>>
>> locale:
>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] splines   stats     graphics  grDevices utils     datasets  methods   
>> base
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help sub setting data frame

2009-10-22 Thread Ista Zahn
Hi Sean,
Comment in line below.

On Thu, Oct 22, 2009 at 5:39 PM, Sean MacEachern  wrote:
> Hi,
>
> I'm running into a problem subsetting a data frame that I have never
> encountered before:
>
>> dim(chkPd)
> [1] 3213    6
>
>> df = head(chkPd)
>> df
>               PN        WB      Sire     Dam   MG SEX
> 601      1001  715349   61710   61702   67    F
> 969  1001_1  511092 616253 615037 168    F
> 986  1002_1  511082 616253 623905 168    F
> 667      1003  715617   61817   61441   67    F
> 1361 1003_1 510711 635246 627321 168    F
> 754       1004 715272   62356   61380  67     F
>
>
>> dfb = chkPd[df$PN,]
>> dfb
>            PN     WB   Sire    Dam  MG  SEX
> 1001    2114_1 510944 616294 614865 168    M
> NA             NA        NA 
> NA.1           NA        NA 
> 1003    1130_1 510950 616294 619694 168    F
> NA.2           NA        NA 
> 1004 2221-SHR2 510952 616294 619694 168    M
>
>
> I'm not sure why I'm getting this behaviour? By sub-setting the
> original data frame by PN I seem to be pulling out row numbers?
> Therefore I am only getting results where PN is less than the
> dimensions of the original data frame and of course nothing where PN
> has _ in the id. I have also tried using subset but haven't had any
> luck with that either.

That is the documented behavior as far as I can tell. See

?"[.data.frame"

Maybe my brain is going soft at the end of a long day, but I can't
tell what you're trying to do. Can you clarify?

-Ista

>
>
>>dfb = subset(chkPd, PN==df$PN)
> Warning message:
> In PN == df$PN :
>  longer object length is not a multiple of shorter object length
>
> I wasn't aware that both the larger data frame had to be a multiple of
> the object you were sub-setting . In any case I would appreciate any
> insight into what I may be doing wrong.
>
> Cheers,
>
> Sean
>
>
>> sessionInfo()
> R version 2.9.1 (2009-06-26)
> i386-apple-darwin8.11.1
>
> locale:
> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets  methods   base
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatization of non-linear regression

2009-10-22 Thread Douglas Bates
Alternatively you could install the NRAIA package and collapse the
construction of the plot to a call to plotfit as shown in the
enclosed.

Note that this is a poor fit of a logistic growth model.  There are no
data values beyond the inflection point so the Asym parameter (the
asymptote) is poorly determined and the asymptote and the inflection
point on the x axis (the xmid parameter) will be highly correlated.
Furthermore the increasing variability is due to differences between
plants.  You can check with

xyplot(severity ~ time, data1, groups = plant, type = c("g", "b"))

so a nonlinear mixed-effects model may be more appropriate.  See
Pinheiro and Bates (Springer, 2000).

2009/10/22 Peter Ehlers :
> Joel,
>
> The following should answer most of your questions:
>
>  time <- c(seq(0,10),seq(0,10),seq(0,10))
>  plant <- c(rep(1,11),rep(2,11),rep(3,11))
>  severity <- c(
>        42,51,59,64,76,93,106,125,149,171,199,
>        40,49,58,72,84,103,122,138,162,187,209,
>        41,49,57,71,89,112,146,174,218,250,288)/288
>
> # Suggestion: you don't need cbind() to make a dataframe:
>
>  data1 <- data.frame(time, plant, severity)
>
> # Plot severity versus time
> # you can avoid the awkward ..$.. by using with():
>
>  with(data1, plot(time, severity, type="n"))
>  with(data1, text(time, severity, plant))
>  title(main="Graph of severity vs time")
>
> # Now you use either the 'getInitial' approach and
> # transform your parameter estimates to your
> # preferred ones, or you can use SSlogis for your
> # model and transform the parameter estimates
> # afterwards:
>
> # Method 1:
> # 
>  ini <- getInitial(
>        severity ~ SSlogis(time, alpha, xmid, scale),
>        data = data1
>  )
>  ini <- unname(ini) ##to get rid of names
>
> # start vector in terms of alpha, beta, gamma:
>  para0.st <- c(
>         alpha = ini[1],
>         beta  = ini[2]/ini[3],
>         gamma = 1/ini[3]
>  )
>
>  fit0 <- nls(
>          severity~alpha/(1+exp(beta-gamma*time)),
>          data1,
>          start=para0.st,
>          trace=T
>  )
>
> # logistic curve on the scatter plot
>  co <- coef(fit0)  ##get the parameter estimates
>  curve(co[1]/(1+exp(co[2]-co[3]*x)), add=TRUE)
>
> # you don't need from, to, since the plot is already
> # set up and you don't want to restrict the curve;
>
> # personally, I prefer defining the function to be plotted:
>
>  f <- function(x, abc){
>    alpha <- abc[1]
>    beta  <- abc[2]
>    gamma <- abc[3]
>    alpha / (1 + exp(beta - gamma * x))
>  }
>
> # then you can plot the curve with:
>
>  curve(f(x, coef(fit0)), add = TRUE)
>
>
> # Method 2:
> # 
>  fit2 <- nls(severity ~ SSlogis(time, Asym, xmid, scal), data = data1)
>  co <- unname(coef(fit2))
>  abc <- c(co[1], co[2]/co[3], 1/co[3])
>  with(data1, plot(time, severity, type="n"))
>  with(data1, text(time, severity, plant))
>  title(main="Graph of severity vs time")
>  curve(f(x, abc), add = TRUE)
>
>
> Cheers,
> -Peter Ehlers
>
> Joel Fürstenberg-Hägg wrote:
>>
>> Hi everybody,
>>
>>
>> I'm using the method described here to make a linear regression:
>>
>>
>>
>> http://www.apsnet.org/education/advancedplantpath/topics/Rmodules/Doc1/05_Nonlinear_regression.html
>>
>>
>>>
>>> ## Input the data that include the variables time, plant ID, and severity
>>> time <- c(seq(0,10),seq(0,10),seq(0,10))
>>> plant <- c(rep(1,11),rep(2,11),rep(3,11))
>>>
>>> ## Severity represents the number of
>>> ## lesions on the leaf surface, standardized
>>> ## as a proportion of the maximum
>>> severity <- c(
>>
>> +         42,51,59,64,76,93,106,125,149,171,199,
>> +         40,49,58,72,84,103,122,138,162,187,209,
>> +         41,49,57,71,89,112,146,174,218,250,288)/288
>>>
>>> data1 <- data.frame(
>>
>> +         cbind(
>> +                 time,
>> +                 plant,
>> +                 severity
>> +         )
>> + )
>>>
>>> ## Plot severity versus time ## to see the relationship between## the two
>>> variables for each plant
>>> plot(
>>
>> +         data1$time,
>> +         data1$severity,
>> +         xlab="Time",
>> +         ylab="Severity",
>> +         type="n"
>> + )
>>>
>>> text(
>>
>> +         data1$time,
>> +         data1$severity,
>> +         data1$plant
>> + )
>>>
>>> title(main="Graph of severity vs time")
>>>
>>> getInitial(
>>
>> +         severity ~ SSlogis(time, alpha, xmid, scale),
>> +         data = data1
>> + )
>>    alpha      xmid     scale  2.212468 12.506960  4.572391
>>>
>>> ## Using the initial parameters above,
>>> ## fit the data with a logistic curve.
>>> para0.st <- c(
>>
>> +         alpha=2.212,
>> +         beta=12.507/4.572, # beta in our model is xmid/scale
>> +         gamma=1/4.572 # gamma (or r) is 1/scale
>> + )
>>>
>>> fit0 <- nls(
>>
>> +         severity~alpha/(1+exp(beta-gamma*time)),
>> +         data1,
>> +         start=para0.st,
>> +         trace=T
>> + )
>> 0.1621433 :  2.212 2.7355643 0.2187227 0.1621427 :  2.2124095
>> 2.7352979 0.2187056
>>>
>>> ## Plot to see how the model fits th

[R] help sub setting data frame

2009-10-22 Thread Sean MacEachern
Hi,

I'm running into a problem subsetting a data frame that I have never
encountered before:

> dim(chkPd)
[1] 32136

> df = head(chkPd)
> df
   PNWB  Sire Dam   MG SEX
601  1001  715349   61710   61702   67F
969  1001_1  511092 616253 615037 168F
986  1002_1  511082 616253 623905 168F
667  1003  715617   61817   61441   67F
1361 1003_1 510711 635246 627321 168F
754   1004 715272   62356   61380  67 F


> dfb = chkPd[df$PN,]
> dfb
PN WB   SireDam  MG  SEX
10012114_1 510944 616294 614865 168M
NA NANA 
NA.1   NANA 
10031130_1 510950 616294 619694 168F
NA.2   NANA 
1004 2221-SHR2 510952 616294 619694 168M


I'm not sure why I'm getting this behaviour? By sub-setting the
original data frame by PN I seem to be pulling out row numbers?
Therefore I am only getting results where PN is less than the
dimensions of the original data frame and of course nothing where PN
has _ in the id. I have also tried using subset but haven't had any
luck with that either.


>dfb = subset(chkPd, PN==df$PN)
Warning message:
In PN == df$PN :
  longer object length is not a multiple of shorter object length

I wasn't aware that both the larger data frame had to be a multiple of
the object you were sub-setting . In any case I would appreciate any
insight into what I may be doing wrong.

Cheers,

Sean


> sessionInfo()
R version 2.9.1 (2009-06-26)
i386-apple-darwin8.11.1

locale:
en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDF too large, PNG bad quality

2009-10-22 Thread Greg Snow
For getting the details in the outer points, here is what I do.

1. use hexbin to create the big central blob (but with additional info).  
2. use the chull function to find the outer points and save their indices in 
another vector
3. use chull on the rest of the points (excluding those found previously) and 
append their indices to the previous ones found.
4. repeat step 3 until have about 100-250 outer points (while loop works nicely)
5. use the points function to add just the outer points found above to the plot.

This gives a plot with the color/shade representing the density where the most 
points are, but also shows the individual points out on the edges, the only 
thing that is missed are possibly interesting points laying between peaks.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: b.rowling...@googlemail.com [mailto:b.rowling...@googlemail.com]
> On Behalf Of Barry Rowlingson
> Sent: Thursday, October 22, 2009 2:43 PM
> To: Greg Snow
> Cc: Lasse Kliemann; r-help@r-project.org
> Subject: Re: [R] PDF too large, PNG bad quality
> 
> On Thu, Oct 22, 2009 at 8:28 PM, Greg Snow  wrote:
> > The problem with the pdf files is that they are storing the
> information for every one of your points, even the ones that are
> overplotted by other points.  The png file is smaller because it only
> stores information on which color each pixel should be, not how many
> points contributed to a particular pixel being a given color.  But then
> png files convert the text to pixel information as well which don't
> look good if there is post scaling.
> >
> > If you want to go the pdf route, then you need to find some way to
> reduce redundant information while still getting the main points of the
> plot.  With so many point, I would suggest looking at the hexbin
> package (bioconductor I think) as one approach, it will not be an
> identical scatterplot, but will convey the information (possibly
> better) with much smaller graphics file sizes.  There are other tools
> like sunflower plots or others, but hexbin has worked well for me.
> >
> 
>  I've seen this kind of thing happen after waiting an hour for one of
> my printouts when queued after something submitted by one of our
> extreme value stats people. I've seen them make plots containing maybe
> a million points, most of which are in a big black blob, but they want
> to be able to show the important sixty or so points at the extremes.
> 
>  I'm not sure what the best way to print this kind of thing is - if
> they know where the big blob is going to be then they could apply some
> cutoff to the plot and only show points outside the cutoff, and fill
> the region inside the cutoff with a black polygon...
> 
>  Another idea may be to do a high resolution plot as a PNG (think 300
> pixels per inch of your desired final output) but do it without text
> and add that on later in a graphics package.
> 
> Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tapply with multiple arguments that are not part of the same data frame

2009-10-22 Thread Kavitha Venkatesan
I just realized my earlier post of my question below was not in
"Plain" Text mode, hence the repeat post...apologies!
Kavitha


On Thu, Oct 22, 2009 at 4:19 PM, Kavitha Venkatesan
 wrote:
> Hi all,
>
> I would like to invoke a function that takes multiple arguments (some of
> which are specified columns in the data frame, and others that are
> independent of the data frame) on split parts of a data frame, how do I do
> this?
>
> For example, let's say I have a data frame
>>fitness_data
> name  height  weight  country
> rob  5.8    200  usa
> nancy  5.5    140  germany
> jen   5.6    150  usa
> clark 5.10  210 germany
> matt 5.9 280 canada
> ralph    6   270 canada
> ...
> ...
>
> Now let us say I have a function,  my_func(h, w, noise, dir), which takes as
> input:
> (1) a vector of heights
> (2) a vector of weights
> (3) a user-input numeric "noise" value
> (4) a user-input string "dir" for the directory to output the end result of
> the function to
>
> This function does some calculations on the input data and outputs a
> dataframe that is then written to a file in the "dir" directory.
>
> If I want to apply this function to data grouped by each country in the
> "fitness_data" dataframe, how would I do this? I tried looking through the
> mailing archives, but couldn't nail down the solution. I tried something
> like
>
> split(mapply( function(a,b,c,d) my_func(fitness_data$h, fitness_data$w, 2.5,
> my_directory)), fitness_data$country)
>
> but this considered fitness_data$h, and fitness_data$w in each single row
> for a country, rather than a vector of heights or weights across all rows
> corresponding to that country.
>
> Thanks!
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDF too large, PNG bad quality

2009-10-22 Thread Barry Rowlingson
On Thu, Oct 22, 2009 at 8:28 PM, Greg Snow  wrote:
> The problem with the pdf files is that they are storing the information for 
> every one of your points, even the ones that are overplotted by other points. 
>  The png file is smaller because it only stores information on which color 
> each pixel should be, not how many points contributed to a particular pixel 
> being a given color.  But then png files convert the text to pixel 
> information as well which don't look good if there is post scaling.
>
> If you want to go the pdf route, then you need to find some way to reduce 
> redundant information while still getting the main points of the plot.  With 
> so many point, I would suggest looking at the hexbin package (bioconductor I 
> think) as one approach, it will not be an identical scatterplot, but will 
> convey the information (possibly better) with much smaller graphics file 
> sizes.  There are other tools like sunflower plots or others, but hexbin has 
> worked well for me.
>

 I've seen this kind of thing happen after waiting an hour for one of
my printouts when queued after something submitted by one of our
extreme value stats people. I've seen them make plots containing maybe
a million points, most of which are in a big black blob, but they want
to be able to show the important sixty or so points at the extremes.

 I'm not sure what the best way to print this kind of thing is - if
they know where the big blob is going to be then they could apply some
cutoff to the plot and only show points outside the cutoff, and fill
the region inside the cutoff with a black polygon...

 Another idea may be to do a high resolution plot as a PNG (think 300
pixels per inch of your desired final output) but do it without text
and add that on later in a graphics package.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] URGENT helo re-beggining with R

2009-10-22 Thread Jose Narillos de Santos
Hi all,

I have re-begging to use R.

I need to using a matrix Y and X with the first row been the name to get a
ols regression and to plot a two elements graph...on the top
the scattergraph and the line representing the regressiónbelow the
resids...at the same time I want to plot with small lines  representing the
error or one desviation from the regressión.

I attach a graph...to get better understood.

Please if you writte me the code explain how to put names on both plots.
MAny Thanks¡¡¡

I saw a document explaining but I can´t find

Thanks in advance

Imagine you load on a directory (in example mydocuments) the data file and
that it is an xls or *.txt file.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Temperature Prediction Model

2009-10-22 Thread Clint Bowman

Aneeta,

If I understand the figure at
 this problem deals 
with sensors in a lab that is probably isolated from outdoor 
temperature changes.


I assume the predictive model must detect when a "rampaging 800 
pound gorilla" messes with a sensor.  Do we also have to detect the 
pawing of a "micro-mouse" as well?


The collected data also seem to have other parameters which would 
be valuable--are you limited to just temperature?


Clint

--
Clint BowmanINTERNET:   cl...@ecy.wa.gov
Air Quality Modeler INTERNET:   cl...@math.utah.edu
Department of Ecology   VOICE:  (360) 407-6815
PO Box 47600FAX:(360) 407-7534
Olympia, WA 98504-7600

On Thu, 22 Oct 2009, Thomas Adams wrote:


Aneeta,

You will have to have a seasonal component built into your model, because the 
seasonal variation does matter, particularly -where- you are geographically 
(San Diego, Chicago, Denver, Miami are very different). Generally, there is a 
sinusoidal daily temperature variation, but frontal passages and 
thunderstorms, etc., can and will disrupt this nice pattern. You may have to 
tie this into temperature predictions from a mesoscale numerical weather 
prediction model. Otherwise, you will end up with lots of misses and false 
alarms…


Regards,
Tom

Aneeta wrote:

 The data that I use has been collected by a sensor network deployed by
 Intel.
 You may take a look at the network at the following website
 http://db.csail.mit.edu/labdata/labdata.html

 The main goal of my project is to simulate a physical layer attack on a
 sensor network and to detect such an attack. In order to detect an attack
 I
 need to have a model that would define the normal behaviour. So the actual
 variation of temperature throughout the year is not very important out
 here.
 I have a set of data for a period of 7 days which is assumed to be the
 correct behaviour and I need to build a model upon that data. I may refine
 the model later on to take into account temperature variations throughout
 the year.

 Yes I am trying to build a model that will predict the temperature just on
 the given time of the day so that I am able to compare it with the
 observed
 temperature and determine if there is any abnormality. Each node should
 have
 its own expectation model (i.e. there will be no correlation between the
 readings of the different nodes).


 Steve Lianoglou-6 wrote:

>  Hi,
> 
>  On Oct 21, 2009, at 12:31 PM, Aneeta wrote:
> 
> 
> >  Greetings!
> > 
> >  As part of my research project I am using R to study temperature data
> >  collected by a network. Each node (observation point) records 
> >  temperature of

> >  its surroundings throughout the day and generates a dataset. Using the
> >  recorded datasets for the past 7 days I need to build a prediction 
> >  model for

> >  each node that would enable it to check the observed data against the
> >  predicted data. How can I derive an equation for temperature using the
> >  datasets?
> >  The following is a subset of one of the datasets:-
> > 
> >   Time  Temperature
> > 
> >  07:00:17.369668   17.509

> >  07:03:17.465725   17.509
> >  07:04:17.597071   17.509
> >  07:05:17.330544   17.509
> >  07:10:47.838123   17.5482
> >  07:14:16.680696   17.5874
> >  07:16:46.67457 17.5972
> >  07:29:16.887654   17.7442
> >  07:29:46.705759   17.754
> >  07:32:17.131713   17.7932
> >  07:35:47.113953   17.8324
> >  07:36:17.194981   17.8324
> >  07:37:17.227013   17.852
> >  07:38:17.809174   17.8618
> >  07:38:48.00011 17.852
> >  07:39:17.124362   17.8618
> >  07:41:17.130624   17.8912
> >  07:41:46.966421   17.901
> >  07:43:47.524823   17.95
> >  07:44:47.430977   17.95
> >  07:45:16.813396   17.95
> > 
>  I think you/we need much more information.
> 
>  Are you really trying to build a model that predicts the temperature 
>  just given the time of day?
> 
>  Given that you're in NY, I'd say 12pm in August sure feels much 
>  different than 12pm in February, no?
> 
>  Or are you trying to predict what one sensor readout would be at a 
>  particular time given readings from other sensors at the same time?
> 
>  Or ... ?
> 
>  -steve
> 
>  --

>  Steve Lianoglou
>  Graduate Student: Computational Systems Biology
> |   Memorial Sloan-Kettering Cancer Center
> |   Weill Medical College of Cornell University
>  Contact Info: http://cbio.mskcc.org/~lianos/contact
> 
>  __

>  R-help@r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide
>  http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.ht

[R] tapply with multiple arguments that are not part of the same data frame

2009-10-22 Thread Kavitha Venkatesan
Hi all,

I would like to invoke a function that takes multiple arguments (some of
which are specified columns in the data frame, and others that are
independent of the data frame) on split parts of a data frame, how do I do
this?

For example, let's say I have a data frame
>fitness_data
name  height  weight  country
rob  5.8200  usa
nancy  5.5140  germany
jen   5.6150  usa
clark 5.10  210 germany
matt 5.9 280 canada
ralph6   270 canada
...
...

Now let us say I have a function,  my_func(h, w, noise, dir), which takes as
input:
(1) a vector of heights
(2) a vector of weights
(3) a user-input numeric "noise" value
(4) a user-input string "dir" for the directory to output the end result of
the function to

This function does some calculations on the input data and outputs a
dataframe that is then written to a file in the "dir" directory.

If I want to apply this function to data grouped by each country in the
"fitness_data" dataframe, how would I do this? I tried looking through the
mailing archives, but couldn't nail down the solution. I tried something
like

split(mapply( function(a,b,c,d) my_func(fitness_data$h, fitness_data$w, 2.5,
my_directory)), fitness_data$country)

but this considered fitness_data$h, and fitness_data$w in each single row
for a country, rather than a vector of heights or weights across all rows
corresponding to that country.

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] intervals package dependence on R 2.9.0

2009-10-22 Thread Ross Boylan
I notice that the intervals package indicates a dependence on R >=
2.9.0.  Is there some feature of R 2.9 that intervals depends on, or
might it work with R 2.7.1, which I am running?

Thanks.
Ross Boylan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame is killing me! help

2009-10-22 Thread Steve Lianoglou

Hi,

On Oct 22, 2009, at 2:35 PM, bbslover wrote:


Usage
data(gasoline)
Format
A data frame with 60 observations on the following 2 variables.
octane
a numeric vector. The octane number.
NIR
a matrix with 401 columns. The NIR spectrum

and I see the gasoline data to see below
NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696  
nm

NIR.1698 nm NIR.1700 nm
1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913  
1.221135
2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985  
1.198851
3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321  
1.208742
4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655  
1.206696
5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864  
1.202926
6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763  
1.207576
7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73  
1.200446
8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947  
1.188174
9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883  
1.196102


look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR. 
1694 nm

NIR.1696 nm NIR.1698 nm NIR.1700 nm

how can I add letters NIR to my variable, because my 600  
independents never

have NIR as the prefix. however, it is needed to model the plsr.   for
example aa=plsr(y~NIR, data=data ,), the prefix NIR is  
necessary, how

can I do with it?


I'm not really sue that I'm getting you, but if your problem is that  
the column names of your data.frame don't match the variable names  
you'd like to use in your formula, just change the colnames of your  
data.frame to match your formula.


BTW - I have no idea where to get this gasoline data set, so I'm just  
imagining:


eg.
colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that',  
'you', 'want', 'here')


-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bayesian regression stepwise function?

2009-10-22 Thread Rolf Turner


On 23/10/2009, at 8:20 AM, Peter Flom wrote:


Frank E Harrell Jr  wrote

Ben Bolker wrote:



Allan.Y wrote:

Hi everyone,

I am wondering if there exists a stepwise regression function  
for the

Bayesian regression model.  I tried googling, but I couldn't find
anything.  I know "step" function exists for regular stepwise  
regression,

but nothing for Bayes.



Why?  That seems so ... un-Bayesian ...


Exactly.  I hope it doesn't exist.  The beauty of Bayes is shrinkage,
borrowing of information, and statement of results in an intuitive  
way.



Yeah.
Asking for stepwise in Bayesian analysis is like asking for some  
nuclear waste on your ice cream sundae.


More like asking for petro-chemical waste in your nuclear waste!

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data frame is killing me! help

2009-10-22 Thread bbslover

Usage 
data(gasoline) 
Format 
A data frame with 60 observations on the following 2 variables. 
octane 
a numeric vector. The octane number. 
NIR 
a matrix with 401 columns. The NIR spectrum

and I see the gasoline data to see below
NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696 nm
NIR.1698 nm NIR.1700 nm 
1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913 1.221135 
2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985 1.198851 
3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321 1.208742 
4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655 1.206696 
5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864 1.202926 
6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763 1.207576 
7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73 1.200446 
8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947 1.188174 
9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883 1.196102

look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm
NIR.1696 nm NIR.1698 nm NIR.1700 nm 

how can I add letters NIR to my variable, because my 600 independents never
have NIR as the prefix. however, it is needed to model the plsr.   for
example aa=plsr(y~NIR, data=data ,), the prefix NIR is necessary, how
can I do with it?
-- 
View this message in context: 
http://www.nabble.com/data-frame-is-killing-me%21-help-tp26015079p26015079.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatization of non-linear regression

2009-10-22 Thread Peter Ehlers

Joel,

The following should answer most of your questions:

  time <- c(seq(0,10),seq(0,10),seq(0,10))
  plant <- c(rep(1,11),rep(2,11),rep(3,11))
  severity <- c(
42,51,59,64,76,93,106,125,149,171,199,
40,49,58,72,84,103,122,138,162,187,209,
41,49,57,71,89,112,146,174,218,250,288)/288

# Suggestion: you don't need cbind() to make a dataframe:

  data1 <- data.frame(time, plant, severity)

# Plot severity versus time
# you can avoid the awkward ..$.. by using with():

  with(data1, plot(time, severity, type="n"))
  with(data1, text(time, severity, plant))
  title(main="Graph of severity vs time")

# Now you use either the 'getInitial' approach and
# transform your parameter estimates to your
# preferred ones, or you can use SSlogis for your
# model and transform the parameter estimates
# afterwards:

# Method 1:
# 
  ini <- getInitial(
severity ~ SSlogis(time, alpha, xmid, scale),
data = data1
  )
  ini <- unname(ini) ##to get rid of names

# start vector in terms of alpha, beta, gamma:
  para0.st <- c(
 alpha = ini[1],
 beta  = ini[2]/ini[3],
 gamma = 1/ini[3]
  )

  fit0 <- nls(
  severity~alpha/(1+exp(beta-gamma*time)),
  data1,
  start=para0.st,
  trace=T
  )

# logistic curve on the scatter plot
  co <- coef(fit0)  ##get the parameter estimates
  curve(co[1]/(1+exp(co[2]-co[3]*x)), add=TRUE)

# you don't need from, to, since the plot is already
# set up and you don't want to restrict the curve;

# personally, I prefer defining the function to be plotted:

  f <- function(x, abc){
alpha <- abc[1]
beta  <- abc[2]
gamma <- abc[3]
alpha / (1 + exp(beta - gamma * x))
  }

# then you can plot the curve with:

  curve(f(x, coef(fit0)), add = TRUE)


# Method 2:
# 
  fit2 <- nls(severity ~ SSlogis(time, Asym, xmid, scal), data = data1)
  co <- unname(coef(fit2))
  abc <- c(co[1], co[2]/co[3], 1/co[3])
  with(data1, plot(time, severity, type="n"))
  with(data1, text(time, severity, plant))
  title(main="Graph of severity vs time")
  curve(f(x, abc), add = TRUE)


Cheers,
-Peter Ehlers

Joel Fürstenberg-Hägg wrote:

Hi everybody,

 


I'm using the method described here to make a linear regression:

 


http://www.apsnet.org/education/advancedplantpath/topics/Rmodules/Doc1/05_Nonlinear_regression.html

 


## Input the data that include the variables time, plant ID, and severity
time <- c(seq(0,10),seq(0,10),seq(0,10))
plant <- c(rep(1,11),rep(2,11),rep(3,11))

## Severity represents the number of
## lesions on the leaf surface, standardized
## as a proportion of the maximum
severity <- c(

+ 42,51,59,64,76,93,106,125,149,171,199,
+ 40,49,58,72,84,103,122,138,162,187,209,
+ 41,49,57,71,89,112,146,174,218,250,288)/288

data1 <- data.frame(

+ cbind(
+ time,
+ plant,
+ severity
+ )
+ )
## Plot severity versus time 
## to see the relationship between## the two variables for each plant

plot(

+ data1$time,
+ data1$severity,
+ xlab="Time",
+ ylab="Severity",
+ type="n"
+ )

text(

+ data1$time,
+ data1$severity,
+ data1$plant
+ )

title(main="Graph of severity vs time")

getInitial(

+ severity ~ SSlogis(time, alpha, xmid, scale),
+ data = data1
+ )
alpha  xmid scale 
 2.212468 12.506960  4.572391 


## Using the initial parameters above,
## fit the data with a logistic curve.
para0.st <- c(

+ alpha=2.212,
+ beta=12.507/4.572, # beta in our model is xmid/scale
+ gamma=1/4.572 # gamma (or r) is 1/scale
+ )

fit0 <- nls(

+ severity~alpha/(1+exp(beta-gamma*time)),
+ data1,
+ start=para0.st,
+ trace=T
+ )
0.1621433 :  2.212 2.7355643 0.2187227 
0.1621427 :  2.2124095 2.7352979 0.2187056 

## Plot to see how the model fits the data; plot the
## logistic curve on a scatter plot
plot(

+ data1$time,
+ data1$severity,
+ type="n"
+ )

text(

+ data1$time,
+ data1$severity,
+ data1$plant
+ )

title(main="Graph of severity vs time")

curve(

+ 2.21/(1+exp(2.74-0.22*x)),
+ from=time[1],
+ to=time[11],
+ add=TRUE
+ )


As you can see I have to do some work manually, such as setting the numbers to 
be used for calculation of alpha, beta and gamma. I wonder if you might have an 
idea how to automatize this? I suppose it should be possible to save the output 
from getInitial() and reach the elements via index or something, but how?

 


I guess a similar approach could be used for the values of fit0?

 


Or even better, if the variables alpha, beta and gamma could be used right away 
for instance in curve(), instead of adding the values manually. But just 
exchanging the values with the varables (alpha instead of 2.21 etc) doesn't 
seem to work. What is the reason for that? Any solution?

Re: [R] Intersection an Sum between list and matrix

2009-10-22 Thread Romildo Martins
Sorry... example 2 was error!

2009/10/22 Romildo Martins 

> Hello,
>
> I need to do an intersection between the list elements (partitionslist) and
> the columns and rows of a matrix (mm), so that the result will be the sums
> of the rows and columns.
>
>
> Thanks a lot,
>
> Romildo Martins
>
> Example
>
> 1.The Intersection and sum betweeen partitionslist[[1]][[2]] and mm is
> indicated in bold.
> 2.The Intersection and sum betweeen partitionslist[[1]][[2]] and mm is
> indicated in red italic.
>
> > partitionslist[[1]][[1]]
>  [1]  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 19 20 21
> > partitionslist[[1]][[2]]
> [1]  0  1  2 18 22
> > mm
>1   4  710 121419  22
> 1  0 18128 18576 20048 19408 21472 16528 20432
> 4  20080 0 20576 23520 19776 19504 21312 22384
> 7  21072 25456 0 18448 19152 22144 18368 19280
> 10 18624 22880 16256 0 17856 16032 17008 19120
> 12 20208 15712 17008 23264 0 23168 19872 24000
> 14 26560 19024 20704 19520 20048 0 16736 21056
> 19 17600 22640 20704 17200 17312 17728 0 18912
> 22 18128 19024 21120 17296 20208 19904 21120 0
>
>
> > example 1
>1 4 710   121419   22
> 1  0 18128 18576 20048 19408 21472 16528 *20432*
> 4  20080 0 20576 23520 19776 19504 21312 22384
> 7  21072 25456 0 18448 19152 22144 18368 19280
> 10 18624 22880 16256 0 17856 16032 17008 19120
> 12 20208 15712 17008 23264 0 23168 19872 24000
> 14 26560 19024 20704 19520 20048 0 16736 21056
> 19 17600 22640 20704 17200 17312 17728 0 18912
> 22 *18128* 19024 21120 17296 20208 19904 21120 0
>
> example 1: sum is *20432+**18128
>
> *> example 2
>1 47   10   12 14  19  22
> *1  *0* *18128 18576 20048 19408 21472 16528 20432*
> 4  *20080* *0* 20576 23520 19776 19504 21312 *22384*
> 7  *21072 *25456 *0 *18448 19152 22144 18368 *19280*
> 10 *18624* 22880 16256 *0* 17856 16032 17008 *19120*
> 12 *20208* 15712 17008 23264 *0* 23168 19872 *24000*
> 14 *26560* 19024 20704 19520 20048 *0* 16736 21056
> 19 *17600 *22640 20704 17200 17312 17728 *0* *18912*
> 22 *18128 19024 21120 17296 20208 19904 21120 0
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDF too large, PNG bad quality

2009-10-22 Thread Greg Snow
The problem with the pdf files is that they are storing the information for 
every one of your points, even the ones that are overplotted by other points.  
The png file is smaller because it only stores information on which color each 
pixel should be, not how many points contributed to a particular pixel being a 
given color.  But then png files convert the text to pixel information as well 
which don't look good if there is post scaling.

If you want to go the pdf route, then you need to find some way to reduce 
redundant information while still getting the main points of the plot.  With so 
many point, I would suggest looking at the hexbin package (bioconductor I 
think) as one approach, it will not be an identical scatterplot, but will 
convey the information (possibly better) with much smaller graphics file sizes. 
 There are other tools like sunflower plots or others, but hexbin has worked 
well for me.

If you want to go the png route, the problem usually comes from scaling the 
plot after producing it.  So, the solution is to create the plot at the exact 
size and at the exact resolution that you want to use it at in your document so 
that no scaling needs to be done.  Use the png function, but don't accept the 
defaults, choose the size and resolution.  If you later decide on a different 
size of graph, recreate the file, don't let LaTeX rescale the first one.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Lasse Kliemann
> Sent: Thursday, October 22, 2009 1:07 PM
> To: r-help@r-project.org
> Subject: [R] PDF too large, PNG bad quality
> 
> I wish to save a scatter plot comprising approx. 2 million points
> in order to include it in a LaTeX document.
> 
> Using 'pdf(...)' produces a file of size about 20 MB, which is
> useless.
> 
> Using 'cairo_pdf(...)' produces a smaller file, around 3 MB. This
> is still too large. Not only that the document will be too large,
> but also PDF viewers choke on this. Moreover, Cairo has problems
> with text: by default text looks ugly, like scaled bitmaps. After
> hours of trying different settings, I discovered that choosing a
> different font family can help, e.g.: 'par(family="Mono")'. This
> gives good-looking text. Yet, the problem with the file size
> remains.
> 
> There exists the hint to produdc EPS instead and then convert to
> PDF using 'epstopdf'. The resulting PDF files are slightly
> smaller, but still too large, and PDF viewers still don't like
> it.
> 
> So I gave PNG a try. PNG files are much smaller and PDF viewers
> have no trouble with them. However, fonts look ugly. The same
> trick that worked for Cairo PDF has no effect for PNG. When I
> view the PNGs with a dedicated viewer like 'qiv', even the fonts
> look good. But not when included in LaTeX; I simply use
> '\includegraphics{...}' and run the document through 'pdflatex'.
> 
> I tried both, creating PNG with 'png(...)' and converting from
> PDF to PNG using 'convert' from ImageMagick.
> 
> So my questions are:
> 
> - Is there a way to produce sufficiently lean PDFs directly in R,
>   even when the plot comprises several million points?
> 
> - How to produce a PNG that still looks nice when included in a
>   LaTeX PDF document?
> 
> Any hints will be greatly appreciated.
> 
> Thank you
> Lasse

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Intersection an Sum between list and matrix

2009-10-22 Thread Romildo Martins
Hello,

I need to do an intersection between the list elements (partitionslist) and
the columns and rows of a matrix (mm), so that the result will be the sums
of the rows and columns.


Thanks a lot,

Romildo Martins

Example

1.The Intersection and sum betweeen partitionslist[[1]][[2]] and mm is
indicated in bold.
2.The Intersection and sum betweeen partitionslist[[1]][[2]] and mm is
indicated in red italic.

> partitionslist[[1]][[1]]
 [1]  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 19 20 21
> partitionslist[[1]][[2]]
[1]  0  1  2 18 22
> mm
   1   4  710 121419  22
1  0 18128 18576 20048 19408 21472 16528 20432
4  20080 0 20576 23520 19776 19504 21312 22384
7  21072 25456 0 18448 19152 22144 18368 19280
10 18624 22880 16256 0 17856 16032 17008 19120
12 20208 15712 17008 23264 0 23168 19872 24000
14 26560 19024 20704 19520 20048 0 16736 21056
19 17600 22640 20704 17200 17312 17728 0 18912
22 18128 19024 21120 17296 20208 19904 21120 0


> example 1
   1 4 710   121419   22
1  0 18128 18576 20048 19408 21472 16528 *20432*
4  20080 0 20576 23520 19776 19504 21312 22384
7  21072 25456 0 18448 19152 22144 18368 19280
10 18624 22880 16256 0 17856 16032 17008 19120
12 20208 15712 17008 23264 0 23168 19872 24000
14 26560 19024 20704 19520 20048 0 16736 21056
19 17600 22640 20704 17200 17312 17728 0 18912
22 *18128* 19024 21120 17296 20208 19904 21120 0

example 1: sum is *20432+**18128

*> example 2
   1 47   10   12 14  19  22
*1  *0* 18128 18576 20048 19408 21472 16528 *20432*
4  20080 *0* 20576 23520 19776 19504 21312 22384
7  21072 25456 *0 *18448 19152 22144 18368 19280
10 18624 22880 16256 *0* 17856 16032 17008 19120
12 20208 15712 17008 23264 *0* 23168 19872 24000
14 26560 19024 20704 19520 20048 *0* 16736 21056
19 17600 22640 20704 17200 17312 17728 *0* 18912
22 *18128 *19024 21120 17296 20208 19904 21120 *0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple extraction of level,x,y from contourLines()

2009-10-22 Thread Alberto Monteiro
Ted Harding asked:
>
> I can of course get these individually with, for the 5th one for
> instance,
> 
>   C.W[[5]]$level
>   C.W[[5]]$x
>   C.W[[5]]$y
> 
> But I can't see how to obtain, in one line and without running
> a nasty loop, to get all the levels at once!
> 
> In other words, I'm looking for an expression which will return
> the vector
> 
>   c(C.W[[1]]$level,C.W[[2]]$level,...,C.W[[28]]$level)
> 
H...

Did you try this?

# reproducible example
C.W <- list(list(level = 1, x = 2), list(level = 2, y = 3), list(level = 10, 
z = 4))
sapply(C.W, function(x) x$level)

Alberto Monteiro

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bayesian regression stepwise function?

2009-10-22 Thread Peter Flom
Frank E Harrell Jr  wrote
>Ben Bolker wrote:
>> 
>> 
>> Allan.Y wrote:
>>> Hi everyone,
>>>
>>> I am wondering if there exists a stepwise regression function for the
>>> Bayesian regression model.  I tried googling, but I couldn't find
>>> anything.  I know "step" function exists for regular stepwise regression,
>>> but nothing for Bayes.
>>>
>> 
>> Why?  That seems so ... un-Bayesian ...
>
>Exactly.  I hope it doesn't exist.  The beauty of Bayes is shrinkage, 
>borrowing of information, and statement of results in an intuitive way.


Yeah.
Asking for stepwise in Bayesian analysis is like asking for some nuclear waste 
on your ice cream sundae.

Peter

Peter L. Flom, PhD
Statistical Consultant
Website: www DOT peterflomconsulting DOT com
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter:   @peterflom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDF too large, PNG bad quality

2009-10-22 Thread Benilton Carvalho

Dear Lasse,

This won't answer your specific questions and I apologize for that.  
AFAIK, pdf() produces uncompressed PDFs only. But you could use tools  
like pdftk to compress your PDFs. About the PNGs, you can always set  
the 'res' argument to improve resolution, but it won't beat the PDFs.


I'm not sure the reader (of your LaTeX document) would be interested  
in seeing each of the 2M points of your scatter plot and, even if he  
was, I doubt he could. So, instead of plot(x, y), have you considered  
using:


smoothScatter(x, y)

?

Cheers,

b

On Oct 22, 2009, at 5:07 PM, Lasse Kliemann wrote:


I wish to save a scatter plot comprising approx. 2 million points
in order to include it in a LaTeX document.

Using 'pdf(...)' produces a file of size about 20 MB, which is
useless.

Using 'cairo_pdf(...)' produces a smaller file, around 3 MB. This
is still too large. Not only that the document will be too large,
but also PDF viewers choke on this. Moreover, Cairo has problems
with text: by default text looks ugly, like scaled bitmaps. After
hours of trying different settings, I discovered that choosing a
different font family can help, e.g.: 'par(family="Mono")'. This
gives good-looking text. Yet, the problem with the file size
remains.

There exists the hint to produdc EPS instead and then convert to
PDF using 'epstopdf'. The resulting PDF files are slightly
smaller, but still too large, and PDF viewers still don't like
it.

So I gave PNG a try. PNG files are much smaller and PDF viewers
have no trouble with them. However, fonts look ugly. The same
trick that worked for Cairo PDF has no effect for PNG. When I
view the PNGs with a dedicated viewer like 'qiv', even the fonts
look good. But not when included in LaTeX; I simply use
'\includegraphics{...}' and run the document through 'pdflatex'.

I tried both, creating PNG with 'png(...)' and converting from
PDF to PNG using 'convert' from ImageMagick.

So my questions are:

- Is there a way to produce sufficiently lean PDFs directly in R,
 even when the plot comprises several million points?

- How to produce a PNG that still looks nice when included in a
 LaTeX PDF document?

Any hints will be greatly appreciated.

Thank you
Lasse



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Diffusion of particles inside a sphere

2009-10-22 Thread cls59


carferper wrote:
> 
> Hello veryone,
> 
> I am interested in the diffusion of particles inside a sphere, and its
> release through a small pore on the sphere surface. Unfortunately, I have
> not found the way to do this in R. Could you help me?
> 
> Thank very much in advance for your help
> 
> 

I have used R for very similar types of simulations involving the movement
of groundwater through porous media-- so R definitely has all the tools to
set up and solve these types of models. It also makes it easy to obtain nice
plots of the solution.

However, if you are asking "Is there a ready-made model for situation X?"
the answer is most likely "No, you will have to build it yourself". At the
minimum you will need to:

  * Define the governing equation that describes the situation you are
modeling.
  * Define any initial or boundary conditions that apply to the situation.
  * Choose an appropriate numerical or analytical solution method to solve
the governing equation subject to the initial and boundary conditions.

At that point, you may have questions such as "Are there any packages that
implement numerical/analytical method Y?" or "I am trying to implement
numerical/analytical scheme Y in R, but it isn't working because the
calculation of Z appears to be wrong. Can you help?". These sorts of
questions are specific enough that the members of this list could probably
offer advice on how to implement the nuts and bolts of the operations.

Good luck!

-Charlie

-
Charlie Sharpsteen
Undergraduate
Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: 
http://www.nabble.com/Diffusion-of-particles-inside-a-sphere-tp26010166p26015673.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What happen for Negative binomial link in Lmer fonction?

2009-10-22 Thread Douglas Bates
On Thu, Oct 22, 2009 at 1:39 PM, Ben Bolker  wrote:

> ROBARDET Emmanuelle wrote:
>>
>> Dear R users,
>> I'm performing some GLMMs analysis with a negative binomial link.
>> I already performed such analysis some months ago with the lmer() function
>> but when I tried it today I encountered this problem:
>> Erreur dans famType(glmFit$family) : unknown GLM family: 'Negative
>> Binomial'
>>
>> Does anyone know if the negative binomial family has been removed from
>> this function?
>> I really appreciate any response.
>> Emmanuelle
>>
>>
>
> I would be extremely surprised if this worked in the past; to
> the best of my knowledge the negative binomial family has
> never been implemented in lmer.

I too would be extremely surprised if it had worked in the past,
considering that I have never implemented it.

I did exchange email with Bill Venables about it and we formulated
what seems to be a reasonable approach but it hasn't made it to the
top of the "To Do" list yet.  Right now the big push is on code to
profile the log-likelihood with respect to the parameters so we can
actually get confidence intervals and, the holy grail of
mixed-modeling, p-values.
> One could in principle
> do a glmmPQL fit with the negative binomial family
> (with a fixed value of the overdispersion parameter).
> glmmADMB is another option.
> Can you say which version etc. you were using???
>
> Follow-ups should probably be sent to r-sig-mixed-mod...@r-project.org 
> --
> View this message in context: 
> http://www.nabble.com/What-happen-for-Negative-binomial-link-in-Lmer-fonction--tp26013041p26015140.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple extraction of level,x,y from contourLines()

2009-10-22 Thread Ted Harding
On 22-Oct-09 19:03:06, Duncan Murdoch wrote:
> On 22/10/2009 2:57 PM, (Ted Harding) wrote:
>> A follow-up to my previous query.
>> 
>> Say one of the results returned by contourLines() is
>> 
>>   C.W <- contourLines()
>> 
>> Then C.W is a list of (in this case 28) lists,
>> each of which is a list with components $level (a single number),
>> $x (the x coords of the points on a contour at that level) and
>> $y (the y coords).
>> 
>> I can of course get these individually with, for the 5th one for
>> instance,
>> 
>>   C.W[[5]]$level
>>   C.W[[5]]$x
>>   C.W[[5]]$y
>> 
>> But I can't see how to obtain, in one line and without running
>> a nasty loop, to get all the levels at once!
>> 
>> In other words, I'm looking for an expression which will return
>> the vector
>> 
>>   c(C.W[[1]]$level,C.W[[2]]$level,...,C.W[[28]]$level)
> 
> sapply(C.W., function(x) x$level)
> 
> should do it.  (It will contain repeats if your contours have gaps in 
> them; unique() can remove those.)
> 
> Duncan Murdoch

Excellent! That does do it. Indeed, It is precisely to identify
the "broken" contours that I want the set of levels, so the repeats
are what I'm looking for!

Thanks Duncan,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 22-Oct-09   Time: 20:10:34
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bayesian regression stepwise function?

2009-10-22 Thread Frank E Harrell Jr

Ben Bolker wrote:



Allan.Y wrote:

Hi everyone,

I am wondering if there exists a stepwise regression function for the
Bayesian regression model.  I tried googling, but I couldn't find
anything.  I know "step" function exists for regular stepwise regression,
but nothing for Bayes.



Why?  That seems so ... un-Bayesian ...


Exactly.  I hope it doesn't exist.  The beauty of Bayes is shrinkage, 
borrowing of information, and statement of results in an intuitive way.


Frank

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] PDF too large, PNG bad quality

2009-10-22 Thread Lasse Kliemann
I wish to save a scatter plot comprising approx. 2 million points
in order to include it in a LaTeX document.

Using 'pdf(...)' produces a file of size about 20 MB, which is 
useless.

Using 'cairo_pdf(...)' produces a smaller file, around 3 MB. This 
is still too large. Not only that the document will be too large, 
but also PDF viewers choke on this. Moreover, Cairo has problems 
with text: by default text looks ugly, like scaled bitmaps. After 
hours of trying different settings, I discovered that choosing a 
different font family can help, e.g.: 'par(family="Mono")'. This 
gives good-looking text. Yet, the problem with the file size 
remains.

There exists the hint to produdc EPS instead and then convert to 
PDF using 'epstopdf'. The resulting PDF files are slightly 
smaller, but still too large, and PDF viewers still don't like 
it.

So I gave PNG a try. PNG files are much smaller and PDF viewers 
have no trouble with them. However, fonts look ugly. The same 
trick that worked for Cairo PDF has no effect for PNG. When I 
view the PNGs with a dedicated viewer like 'qiv', even the fonts 
look good. But not when included in LaTeX; I simply use 
'\includegraphics{...}' and run the document through 'pdflatex'.

I tried both, creating PNG with 'png(...)' and converting from 
PDF to PNG using 'convert' from ImageMagick.

So my questions are:

- Is there a way to produce sufficiently lean PDFs directly in R, 
  even when the plot comprises several million points?

- How to produce a PNG that still looks nice when included in a 
  LaTeX PDF document?

Any hints will be greatly appreciated.

Thank you
Lasse


pgpIsY96C949s.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple extraction of level,x,y from contourLines()

2009-10-22 Thread Duncan Murdoch

On 22/10/2009 2:57 PM, (Ted Harding) wrote:

A follow-up to my previous query.

Say one of the results returned by contourLines() is

  C.W <- contourLines()

Then C.W is a list of (in this case 28) lists,
each of which is a list with components $level (a single number),
$x (the x coords of the points on a contour at that level) and
$y (the y coords).

I can of course get these individually with, for the 5th one for
instance,

  C.W[[5]]$level
  C.W[[5]]$x
  C.W[[5]]$y

But I can't see how to obtain, in one line and without running
a nasty loop, to get all the levels at once!

In other words, I'm looking for an expression which will return
the vector

  c(C.W[[1]]$level,C.W[[2]]$level,...,C.W[[28]]$level)


sapply(C.W., function(x) x$level)

should do it.  (It will contain repeats if your contours have gaps in 
them; unique() can remove those.)


Duncan Murdoch


(This is probably something I should already have learned from
the documentation, but I think I may have been asleep during
that class ... ).

With thanks,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 22-Oct-09   Time: 19:57:09
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] contour() & contourLines()

2009-10-22 Thread Ted Harding
Hi Peter!
Many thanks -- I had overlooked that line.
Ted.

On 22-Oct-09 18:53:25, Peter Ehlers wrote:
> Hi Ted,
> 
> Here's what the last example in ?contour says:
> 
>   "## contourLines produces the same contour lines as contour"
> 
> So, even without digging into the source, I would guess
> that your hunch is correct.
> 
> Cheers,
> Peter Ehlers
> 
> (Ted Harding) wrote:
>> Hi Folks,
>> I have been using contour() to produce some contour plots
>> (of a spatially-smooted density produced by kde2d()), with
>> very satisfactory results.
>> 
>> I now want access to the coordinates of the points on the
>> contours, and it would seem that contour() does not return a
>> value, so there is nothing from which these could be extracted.
>> 
>> However, apparently contourLines() does, and it seems to be
>> invoked in essentially the same way (so far as the x,y,z
>> and levels parameters are concerned).
>> 
>> However, I am not sure whether the contour lines generated
>> by contourLines() are the same numerically as those generated
>> (at any rate plotted) by contour().
>> 
>> By over-plotting the coordinates from contourLines() on the
>> contours plotted by contour(), it looks as though they may be
>> the same.
>> 
>> Advice would be appreciated!
>> With thanks,
>> Ted.
>> 
>> 
>> E-Mail: (Ted Harding) 
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 22-Oct-09   Time: 19:03:37
>> -- XFMail --
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>>
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 22-Oct-09   Time: 20:00:02
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simple extraction of level,x,y from contourLines()

2009-10-22 Thread Ted Harding
A follow-up to my previous query.

Say one of the results returned by contourLines() is

  C.W <- contourLines()

Then C.W is a list of (in this case 28) lists,
each of which is a list with components $level (a single number),
$x (the x coords of the points on a contour at that level) and
$y (the y coords).

I can of course get these individually with, for the 5th one for
instance,

  C.W[[5]]$level
  C.W[[5]]$x
  C.W[[5]]$y

But I can't see how to obtain, in one line and without running
a nasty loop, to get all the levels at once!

In other words, I'm looking for an expression which will return
the vector

  c(C.W[[1]]$level,C.W[[2]]$level,...,C.W[[28]]$level)

(This is probably something I should already have learned from
the documentation, but I think I may have been asleep during
that class ... ).

With thanks,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 22-Oct-09   Time: 19:57:09
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] contour() & contourLines()

2009-10-22 Thread Peter Ehlers

Hi Ted,

Here's what the last example in ?contour says:

 "## contourLines produces the same contour lines as contour"

So, even without digging into the source, I would guess
that your hunch is correct.

Cheers,
Peter Ehlers

(Ted Harding) wrote:

Hi Folks,
I have been using contour() to produce some contour plots
(of a spatially-smooted density produced by kde2d()), with
very satisfactory results.

I now want access to the coordinates of the points on the
contours, and it would seem that contour() does not return a
value, so there is nothing from which these could be extracted.

However, apparently contourLines() does, and it seems to be
invoked in essentially the same way (so far as the x,y,z
and levels parameters are concerned).

However, I am not sure whether the contour lines generated
by contourLines() are the same numerically as those generated
(at any rate plotted) by contour().

By over-plotting the coordinates from contourLines() on the
contours plotted by contour(), it looks as though they may be
the same.

Advice would be appreciated!
With thanks,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 22-Oct-09   Time: 19:03:37
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help regarding creating package for GUI

2009-10-22 Thread Liviu Andronic
On 10/22/09, Peter Ehlers  wrote:
>  But as to your query: perhaps you could get some ideas from
>  the Greg Snow's TeachingDemos package.
>
Also, check the related Rcmdr plug-in [1].
Liviu

[1] http://cran.r-project.org/web/packages/RcmdrPlugin.TeachingDemos/index.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What happen for Negative binomial link in Lmer fonction?

2009-10-22 Thread Ben Bolker



ROBARDET Emmanuelle wrote:
> 
> Dear R users,
> I'm performing some GLMMs analysis with a negative binomial link.
> I already performed such analysis some months ago with the lmer() function
> but when I tried it today I encountered this problem:
> Erreur dans famType(glmFit$family) : unknown GLM family: 'Negative
> Binomial'
> 
> Does anyone know if the negative binomial family has been removed from
> this function?
> I really appreciate any response.
> Emmanuelle
> 
> 

I would be extremely surprised if this worked in the past; to
the best of my knowledge the negative binomial family has
never been implemented in lmer.  One could in principle
do a glmmPQL fit with the negative binomial family
(with a fixed value of the overdispersion parameter).
glmmADMB is another option.
Can you say which version etc. you were using???

Follow-ups should probably be sent to r-sig-mixed-mod...@r-project.org 
-- 
View this message in context: 
http://www.nabble.com/What-happen-for-Negative-binomial-link-in-Lmer-fonction--tp26013041p26015140.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bayesian regression stepwise function?

2009-10-22 Thread Ben Bolker



Allan.Y wrote:
> 
> Hi everyone,
> 
> I am wondering if there exists a stepwise regression function for the
> Bayesian regression model.  I tried googling, but I couldn't find
> anything.  I know "step" function exists for regular stepwise regression,
> but nothing for Bayes.
> 

Why?  That seems so ... un-Bayesian ...

 
-- 
View this message in context: 
http://www.nabble.com/Bayesian-regression-stepwise-function--tp26013725p26015081.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] arima crashes too

2009-10-22 Thread Barry Rowlingson
On Thu, Oct 22, 2009 at 6:19 PM, Alberto Monteiro
 wrote:
> Another pathological test.
>
> arima does not crash for that series that crashes arma:
>
> arima(c(2.01, 2.22, 2.09, 2.17, 2.42), order=c(1,0,0))
>
> However, arima crashes for this:
>
> arima(c(1.71, 1.78, 1.95, 1.59, 2.13), order=c(1,0,0))
>
> arima seems pretty consistent in its crashing behaviour, since crashing for
> one series means crashing for all affine series:

 I'm not getting what I'd call 'crashes' with your arma or arima
examples- I get an error message and a warning:

 > arma(c(2.01, 2.22, 2.09, 2.17, 2.42), order=c(1,0))
Error in AA %*% t(X) : requires numeric/complex matrix/vector arguments
In addition: Warning message:
In ar.ols(x, order.max = k, aic = FALSE, demean = FALSE, intercept =
include.intercept) :
  model order: 2singularities in the computation of the projection
matrixresults are only valid up to model order1

 You've not told us what you get, and the phrase 'crash' normally
means some kind of memory error that *terminates* a running R session.
Are you really crashing R such that it terminates? In which case, what
version number/platform etc?

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] contour() & contourLines()

2009-10-22 Thread Ted Harding
Hi Folks,
I have been using contour() to produce some contour plots
(of a spatially-smooted density produced by kde2d()), with
very satisfactory results.

I now want access to the coordinates of the points on the
contours, and it would seem that contour() does not return a
value, so there is nothing from which these could be extracted.

However, apparently contourLines() does, and it seems to be
invoked in essentially the same way (so far as the x,y,z
and levels parameters are concerned).

However, I am not sure whether the contour lines generated
by contourLines() are the same numerically as those generated
(at any rate plotted) by contour().

By over-plotting the coordinates from contourLines() on the
contours plotted by contour(), it looks as though they may be
the same.

Advice would be appreciated!
With thanks,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 22-Oct-09   Time: 19:03:37
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cairo package, png files within for loop are black?

2009-10-22 Thread Douglas Bates
On Thu, Oct 22, 2009 at 12:46 PM, Douglas M. Hultstrand
 wrote:
> Hello,
>
> I am generating .png images using the Cairo package in a for loop (looping
> on the number of zones, number of zones equals the number of plots to create
> based on different zone data).   When  I run the R script the .png files are
> created but they are all black?  If I comment out the for loop and force my
> zones to equal one the png file is created correctly?  Is there an issue
> with generating .png files within a for loop?

Probably not.  The behavior you observe is due to a lattice graphics
function being called in a loop.  When called within a loop you must
"print" or "show" the result of a lattice graphics function.  See FAQ
7.22

> Create .png commands within for loop:
>
> CairoPNG(paste(t.g), width=800, height=600, pointsize=12, bg="white")
>   xyplot(areasqmi ~ value, x.p, groups=dur,
>   main=(t.n), ylab=expression("Area(mi" ^ 2 * ")"),
>   xlab="Maximum Average Depth of Percipitation (inches)",
>   scales=list(y=list(log=TRUE)), ylim=c(1,100), type='l',
>   auto.key=list(space='right'))
> dev.off()
>
> --
> -
> Douglas M. Hultstrand, MS
> Senior Hydrometeorologist
> Metstat, Inc. Windsor, Colorado
> voice: 970.686.1253
> email: dmhul...@metstat.com
> web: http://www.metstat.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loop vs. apply(): strange behavior with data frame?

2009-10-22 Thread Roberto Perdisci
Thanks for the suggestion.
   I found some documentation on why accessing a data.gram using the
matrix notation (e.g., [i,j]) is so expensive, which was the cause of
the problem.

regards,

Roberto

On Thu, Oct 22, 2009 at 12:05 AM, Jim Holtman  wrote:
> try running Rprof on the two examples to see what the difference is. what
> you will probably see is a lot of the time on the dataframe is spent in
> accessing it like a matrix ('['). Rprof is very helpful to see where time is
> spent in your scripts.
>
> Sent from my iPhone
>
> On Oct 21, 2009, at 17:17, Roberto Perdisci 
> wrote:
>
>> Hi everybody,
>>  I noticed a strange behavior when using loops versus apply() on a data
>> frame.
>> The example below "explicitly" computes a distance matrix given a
>> dataset. When the dataset is a matrix, everything works fine. But when
>> the dataset is a data.frame, the dist.for function written using
>> nested loops will take a lot longer than the dist.apply
>>
>>  USING FOR ###
>>
>> dist.for <- function(data) {
>>
>>  d <- matrix(0,nrow=nrow(data),ncol=nrow(data))
>>  n <- ncol(data)
>>  r <- nrow(data)
>>
>>  for(i in 1:r) {
>>    for(j in 1:r) {
>>       d[i,j] <- sum(abs(data[i,]-data[j,]))/n
>>    }
>>  }
>>
>>  return(as.dist(d))
>> }
>>
>>  USING APPLY ###
>>
>> f <- function(data.row,data.rest) {
>>
>>  r2 <- as.double(apply(data.rest,1,g,data.row))
>>
>> }
>>
>> g <- function(row2,row1) {
>>  return(sum(abs(row1-row2))/length(row1))
>> }
>>
>> dist.apply <- function(data) {
>>  d <- apply(data,1,f,data)
>>
>>  return(as.dist(d))
>> }
>>
>>
>>  TESTING ###
>>
>> library(mvtnorm)
>> data <- rmvnorm(100,mean=seq(1,10),sigma=diag(1,nrow=10,ncol=10))
>>
>> tf <- system.time(df <- dist.for(data))
>> ta <- system.time(da <- dist.apply(data))
>>
>> print(paste('diff = ',sum(as.matrix(df) - as.matrix(da
>> print("tf = ")
>> print(tf)
>> print("ta = ")
>> print(ta)
>>
>> print('--')
>> print('Same experiment on data.frame...')
>> data2 <- as.data.frame(data)
>>
>> tf <- system.time(df <- dist.for(data2))
>> ta <- system.time(da <- dist.apply(data2))
>>
>> print(paste('diff = ',sum(as.matrix(df) - as.matrix(da
>> print("tf = ")
>> print(tf)
>> print("ta = ")
>> print(ta)
>>
>> 
>>
>> Here is the output I get on my system (R version 2.7.1 on a Debian lenny)
>>
>> [1] "diff =  0"
>> [1] "tf = "
>>  user  system elapsed
>>  0.088   0.000   0.087
>> [1] "ta = "
>>  user  system elapsed
>>  0.128   0.000   0.128
>> [1] "--"
>> [1] "Same experiment on data.frame..."
>> [1] "diff =  0"
>> [1] "tf = "
>>  user  system elapsed
>> 35.031   0.000  35.029
>> [1] "ta = "
>>  user  system elapsed
>>  0.184   0.000   0.185
>>
>> Could you explain why that happens?
>>
>> thank you,
>> regards
>>
>> Roberto
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to find moving averages within each subgroup of a data frame

2009-10-22 Thread clair.crossup...@googlemail.com
Dear all,

If I have the following data frame:

> set.seed(21)
> df1 <- data.frame(col1=c(rep('a',5), rep('b',5), rep('c',5)), 
> col4=rnorm(1:15))

   col1 col4
1 a  0.793013171
2 a  0.522251264
3 a  1.74641
4 a -1.271336123
5 a  2.197389533
6 b  0.433130777
7 b -1.570199630
8 b -0.934905667
9 b  0.063493345
10b -0.002393336
11c -2.276781240
12c  0.757412225
13c -0.548405554
14c  0.172549478
15c  0.562853068

How can i make a 2 point moving average within each group? i.e.

col1col4SMA
a   0.793013171 NA
a   0.522251264 0.657632218
a   1.74641 1.134236753
a   -1.2713361230.237443059
a   2.197389533 0.463026705
b   0.433130777 NA
b   -1.57019963 -0.568534427
b   -0.934905667-1.252552649
b   0.063493345 -0.435706161
b   -0.0023933360.030550005
c   -2.27678124 NA
c   0.757412225 -0.759684508
c   -0.5484055540.104503336
c   0.172549478 -0.187928038
c   0.562853068 0.367701273

>From what i've read, i was thinking it would be something along the
lines of:

> aggregate(df1$col4, by=list(df1$col1), function(x) {filter(x, rep(1/2,2), 
> sides=1 )} )
Error in aggregate.data.frame(as.data.frame(x), ...) :
  'FUN' must always return a scalar

But this tells me (i think) that aggregate should only return a single
value per group. So what i need, i guess, is something that takes all
the values in a given group, and returns a vector of the same length.
Not sure which function to use for that.

Thanks in advance,

C xx

P.S. on a side note, is it possible to extract the values which
aggregate passes to the function(x) in my example above?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bayesian regression stepwise function?

2009-10-22 Thread Allan.Y

Hi everyone,

I am wondering if there exists a stepwise regression function for the
Bayesian regression model.  I tried googling, but I couldn't find anything. 
I know "step" function exists for regular stepwise regression, but nothing
for Bayes.


Thanks
-- 
View this message in context: 
http://www.nabble.com/Bayesian-regression-stepwise-function--tp26013725p26013725.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cairo package, png files within for loop are black?

2009-10-22 Thread Douglas M. Hultstrand

Hello,

I am generating .png images using the Cairo package in a for loop 
(looping on the number of zones, number of zones equals the number of 
plots to create based on different zone data).   When  I run the R 
script the .png files are created but they are all black?  If I comment 
out the for loop and force my zones to equal one the png file is created 
correctly?  Is there an issue with generating .png files within a for loop?


Create .png commands within for loop:

CairoPNG(paste(t.g), width=800, height=600, pointsize=12, bg="white")
   xyplot(areasqmi ~ value, x.p, groups=dur,
   main=(t.n), ylab=expression("Area(mi" ^ 2 * ")"),
   xlab="Maximum Average Depth of Percipitation (inches)",
   scales=list(y=list(log=TRUE)), ylim=c(1,100), type='l',
   auto.key=list(space='right'))
dev.off()

--
-
Douglas M. Hultstrand, MS
Senior Hydrometeorologist
Metstat, Inc. Windsor, Colorado
voice: 970.686.1253
email: dmhul...@metstat.com
web: http://www.metstat.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help regarding creating package for GUI

2009-10-22 Thread Peter Ehlers

Well, I wonder which 'Sister' you're thinking of. I can think of
more than a few ... Sarah? Ulrike? Heather? ...

And I'm not so sure God would bless an old sinner like me.

But as to your query: perhaps you could get some ideas from
the Greg Snow's TeachingDemos package.

 -Peter Ehlers

SANKET JANEWOO wrote:

Hello Brothers/Sister,

   I am trying to create a GUI for some formulas
in R using tcltk and then convert them into R package . Can anyone
please tell me how to create a package having GUI capabilities in it.

I would be thankful if somebody can help me.May god bless every one of
us living in this beautiful work

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] arima crashes too

2009-10-22 Thread Alberto Monteiro
Another pathological test.

arima does not crash for that series that crashes arma:

arima(c(2.01, 2.22, 2.09, 2.17, 2.42), order=c(1,0,0))

However, arima crashes for this:

arima(c(1.71, 1.78, 1.95, 1.59, 2.13), order=c(1,0,0))

arima seems pretty consistent in its crashing behaviour, since crashing for 
one series means crashing for all affine series:

lets.crash.arima <- c(71, 78, 95, 59, 113)
arima(100 + lets.crash.arima, order = c(1,0,0)) # crashes
arima(-100 + lets.crash.arima, order = c(1,0,0)) # crashes
arima(1 - 0.5 * lets.crash.arima, order = c(1,0,0)) # crashes


Curiously, arma does not crash for this second series:

arma(c(1.71, 1.78, 1.95, 1.59, 2.13), order=c(1,0))

OTOH, I find it amazing that arma crashes or does not crash for series that 
differ only on a constant:

lets.crash.arma <- c(1, 22, 9, 17, 42)
arma(lets.crash.arma, order=c(1,0)) # does not crash
arma(20 + lets.crash.arma, order=c(1,0)) # does not crash
arma(30 + lets.crash.arma, order=c(1,0)) # crashes

Alberto 'Darth Albmont' Monteiro

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to calculate the area under the curve

2009-10-22 Thread Frank E Harrell Jr
I'm not clear on why we are emphasizing the trapezoidal rule when the 
Wilcoxon approach gives you everything plus a standard error.


Frank


Ravi Varadhan wrote:

I assume that you have an ordered pair (x, y) data, where x = sensitivity, and 
y = 1 - specificity.  Your `x' values may or may not be equally spaced.  Here 
is how you could solve your problem.  I show this with an example where we can 
compute the area-under the curve exactly:

# Area under the curve
#
# Trapezoidal rule
# x values need not be equally spaced
#
trapezoid <- function(x,y) sum(diff(x)*(y[-1]+y[-length(y)]))/2
#
#
# Simpson's rule when `n' is odd
# Composite Simpson and Trapezoidal rules when `n' is even
# x values must be equally spaced
#
simpson <- function(x, y){
n <- length(y)
odd <- n %% 2
if (odd) area <- 1/3*sum( y[1] + 2*sum(y[seq(3,(n-2),by=2)]) + 
	4*sum(y[seq(2,(n-1),by=2)]) + y[n])


if (!odd) area <- 1/3*sum( y[1] + 2*sum(y[seq(3,(n-3),by=2)]) + 
	4*sum(y[seq(2,(n-2),by=2)]) + y[n-1]) + 1/2*(y[n-1] + y[n])


dx <- x[2] - x[1]
return(area * dx)
}
#
# An example for AUC calculation
x <- seq(0, 1, length=21)

roc <- function(x, a) x + a * x * (1 - x)

plot(x, roc(x, a=0.5), type="l")
lines(x, roc(x, a=0.8), col=2)
lines(x, roc(x, a=1.2), col=3)
abline(b=1, lty=2)

y <- roc(x, a=1)

trapezoid(x, y)  # exact answer is 2/3

simpson(x, y) # exact answer is 2/3

As you can see the Simpson's rule is more accurate, but the difference should not matter 
in applications, as long as you have sufficient number of points for sensitivity and 
specificity.  Also, note that the improved accuracy of Simpson's rule is more fully 
realized when there are "odd" number of `x' values.  If the number of points is 
even, the trapezoidal rule at the end point degrades the accuracy of Simpson 
approximation.

Hope this helps,
Ravi.



Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: "olivier.abz" <0509...@rgu.ac.uk>
Date: Thursday, October 22, 2009 10:24 am
Subject: [R]  How to calculate the area under the curve
To: r-help@r-project.org


Hi all, 


I would like to calculate the area under the ROC curve for my predictive
model. I have managed to plot points giving me the ROC curve. However, 
I do
not know how to get the value of the area under. 
Does anybody know of a function that would give the result I want 
using an

array of specificity and an array of sensitivity as input?

Thanks, 


Olivier
--
View this message in context: 
Sent from the R help mailing list archive at Nabble.com.


__
R-help@r-project.org mailing list

PLEASE do read the posting guide 
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple question, I think

2009-10-22 Thread Chuck Cleland
On 10/22/2009 12:37 PM, David Kaplan wrote:
> Greetings,
> 
> I am recoding a dummy variable (coded 1,0) so that 0 = 2.  I am using
> the line
> 
> sciach$dummyba[sciach$ba==0] <- 2
> 
> I notice that it creates a new column dummyba, with 0 coded as 2 but
> with 1's now coded as NA.  Is there a simple way around this in the line
> I am using, or do I need to have an additional line
> 
> sciach$dummyba[sciach$ba==1] <- 1

  You might do the recoding like this:

with(sciach, ifelse(ba == 0, 2, ifelse(ba == 1, 1, NA)))

  or like this:

sciach$ba * -1 + 2

> Thanks in advance. 
> 
> David
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code. 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple question, I think

2009-10-22 Thread Kevin E. Thorpe

David Kaplan wrote:

Greetings,

I am recoding a dummy variable (coded 1,0) so that 0 = 2.  I am using 
the line


sciach$dummyba[sciach$ba==0] <- 2

I notice that it creates a new column dummyba, with 0 coded as 2 but 
with 1's now coded as NA.  Is there a simple way around this in the line 
I am using, or do I need to have an additional line


sciach$dummyba[sciach$ba==1] <- 1

Thanks in advance.


David


Try sciach$dummyba <- ifelse(sciach$ba==0,2,1)


--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] arma crashing

2009-10-22 Thread Alberto Monteiro
Function arma is crashing in some (pathological, but crashing is never good) 
cases.

For example:

library(tseries)
arma(c(2.01, 2.22, 2.09, 2.17, 2.42), order=c(1,0))

I came to that pathological series while doing test cases; probably there 
are crashing cases with longer series.

Alberto Monteiro

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple question, I think

2009-10-22 Thread Henrique Dallazuanna
Use ifelse:

sciach$dummyba <- ifelse(sciach$ba == 0, 2, 1)

On Thu, Oct 22, 2009 at 2:37 PM, David Kaplan
 wrote:
> Greetings,
>
> I am recoding a dummy variable (coded 1,0) so that 0 = 2.  I am using the
> line
>
> sciach$dummyba[sciach$ba==0] <- 2
>
> I notice that it creates a new column dummyba, with 0 coded as 2 but with
> 1's now coded as NA.  Is there a simple way around this in the line I am
> using, or do I need to have an additional line
>
> sciach$dummyba[sciach$ba==1] <- 1
>
> Thanks in advance.
>
>
> David
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simple question, I think

2009-10-22 Thread David Kaplan

Greetings,

I am recoding a dummy variable (coded 1,0) so that 0 = 2.  I am using 
the line


sciach$dummyba[sciach$ba==0] <- 2

I notice that it creates a new column dummyba, with 0 coded as 2 but 
with 1's now coded as NA.  Is there a simple way around this in the line 
I am using, or do I need to have an additional line


sciach$dummyba[sciach$ba==1] <- 1

Thanks in advance.


David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Boxplot with grouped data

2009-10-22 Thread Paul Smith
On Thu, Oct 22, 2009 at 5:31 PM, Jakson A. Aquino
 wrote:
>> Is there some way of drawing a boxplot, with R, when one does not have
>> the original continuous data, but only the data grouped in classes?
>> The function boxplot() can only deal with original data.
>
> Do you mean a numeric vector grouped by a factor? If you have a
> numeric vector x and a categorical variable f, then:
>
> boxplot(x ~ f)

No, Jakson, I mean something like the following:

[0,1] --> 5%
]1,3] --> 25%
]3,4] --> 10%
]4,7] --> 20%
]7,11] --> 40%

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Boxplot with grouped data

2009-10-22 Thread Jakson A. Aquino
On Thu, Oct 22, 2009 at 04:36:22PM +0100, Paul Smith wrote:
> Is there some way of drawing a boxplot, with R, when one does not have
> the original continuous data, but only the data grouped in classes?
> The function boxplot() can only deal with original data.

Do you mean a numeric vector grouped by a factor? If you have a
numeric vector x and a categorical variable f, then:

boxplot(x ~ f)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help regarding creating package for GUI

2009-10-22 Thread SANKET JANEWOO
Hello Brothers/Sister,

   I am trying to create a GUI for some formulas
in R using tcltk and then convert them into R package . Can anyone
please tell me how to create a package having GUI capabilities in it.

I would be thankful if somebody can help me.May god bless every one of
us living in this beautiful work

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What happen for Negative binomial link in Lmer fonction?

2009-10-22 Thread ROBARDET Emmanuelle
Dear R users,
I'm performing some GLMMs analysis with a negative binomial link.
I already performed such analysis some months ago with the lmer() function but 
when I tried it today I encountered this problem:
Erreur dans famType(glmFit$family) : unknown GLM family: 'Negative Binomial'

Does anyone know if the negative binomial family has been removed from this 
function?
I really appreciate any response.
Emmanuelle




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Boxplot with grouped data

2009-10-22 Thread Peter Flom

Paul Smith  wrote
>
>Is there some way of drawing a boxplot, with R, when one does not have
>the original continuous data, but only the data grouped in classes?
>The function boxplot() can only deal with original data.

It's not clear how the data are, now.  What are the classes? Are they numbers? 

Let's make something up.  Is this what you have and want?

y <- c(rep("0-10", 5), rep("11-20", 10), rep("21-30", 15))

y <- as.ordered(y)
y <- as.numeric(y)
y[y == 1] <- 5
y[y == 2] <- 15.5
y[y == 3] <- 25.5
boxplot(y)

?

Peter


Peter L. Flom, PhD
Statistical Consultant
Website: www DOT peterflomconsulting DOT com
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter:   @peterflom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Boxplot with grouped data

2009-10-22 Thread Paul Smith
Dear All,

Is there some way of drawing a boxplot, with R, when one does not have
the original continuous data, but only the data grouped in classes?
The function boxplot() can only deal with original data.

Thanks in advance,

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replacing multiple elements in a vector !

2009-10-22 Thread Henrique Dallazuanna
Try this:

merge(rs.id, snp.id, by.x = 1, by.y = 2)$V2

or:

with(snp.id, V2[match(rs.id, V3)])

On Thu, Oct 22, 2009 at 12:12 PM, Praveen Surendran
 wrote:
> Hi,
>
>
>
> I have a vector with elements
>
>
>
> rs.id=c('rs100','rs101','rs102','rs103')
>
>
>
> And a dataframe 'snp.id'
>
>
>
> 1              SNP_100              rs100
>
> 2              SNP_101              rs101
>
> 3              SNP_102              rs102
>
> 4              SNP_103              rs103
>
>
>
> Task is to replace rs.id vector with corresponding 'SNP_' ids  in snp.id.
>
>
>
> Thanks in advance.
>
>
>
> Praveen Surendran
>
> 2G, Complex and Adaptive Systems Laboratory (UCD CASL)
>
> School of Medicine and Medical Sciences
>
> University College Dublin
>
> Belfield, Dublin 4
>
> Ireland.
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replacing multiple elements in a vector !

2009-10-22 Thread Steve Lianoglou

Hi,

On Oct 22, 2009, at 10:12 AM, Praveen Surendran wrote:


Hi,

I have a vector with elements

rs.id=c('rs100','rs101','rs102','rs103')

And a dataframe 'snp.id'

1  SNP_100  rs100
2  SNP_101  rs101
3  SNP_102  rs102
4  SNP_103  rs103

Task is to replace rs.id vector with corresponding 'SNP_' ids  in  
snp.id.



First, please post your code and data in such a way that it's easy for  
those of us trying to help you to paste into our R session quickly,  
for instance, a better way to have given your data.frame would have  
been like so:


R> > rs.id=c('rs100','rs101','rs102','rs103')
R > df <- data.frame(snp.id=paste("SNP", 100:103, sep="_"),  
rs.id=rs.id, )


Now, assuming rs.id <-> snp.id is a one-to-one mapping:

1. One way to get the appropriate df$snp.id for each value in your  
rs.id vector is to use ``match``:


R> df$snp.id[match(rs.id, df$rs.id)]
[1] "SNP_100" "SNP_101" "SNP_102" "SNP_103"

2. Alternatively, you can use the "rs*" values as the rownames of your  
data.frame, and use the rs.id value to select:


R> rownames(df) <- df$rs.id
R> df[rs.id, 'snp.id']
[1] "SNP_100" "SNP_101" "SNP_102" "SNP_103"


-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm.fit to use LAPACK instead of LINPACK

2009-10-22 Thread Ravi Varadhan
Ted,

LAPACK is newer and is supposed to contain better algorithms than LINPACK.  It 
is not true that LAPACK cannot handle rank-deficient problems.  It can.

However, I do not know if using LAPACK in glm.fit instead of LINPACK would be 
faster and/or more memory efficient.

Ravi.



Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: Ted 
Date: Thursday, October 22, 2009 10:53 am
Subject: Re: [R] glm.fit to use LAPACK instead of LINPACK
To: "r-help@R-project.org" 


> Hi,
> 
> I understand that the glm.fit calls LINPACK fortran routines instead of
> LAPACK because it can handle the 'rank deficiency problem'.  If my data
> matrix is not rank deficient, would a glm.fit function which runs on
> LAPACK be faster?  Would this be worthwhile to convert glm.fit to use
> LAPACK?  Has anyone done this already??  What is the best way to do this?
> 
> I'm looking at very large datasets (thousands of glm calls), and would
> like to know if it's worth the effort for performance issues.
> 
> Thanks,
> 
> Ted
> 
> -
> Ted Chiang
>   Bioinformatics Analyst
>   Centre for Computational Biology
>   Hospital for Sick Children, Toronto
>   416.813.7028
>   tchi...@sickkids.ca
> 
> __
> R-help@r-project.org mailing list
> 
> PLEASE do read the posting guide 
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to calculate the area under the curve

2009-10-22 Thread Tom Fletcher
See package ROCR. Then see ?performance; in the details, it describes a
measure of auc.

Tom Fletcher



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of olivier.abz
Sent: Thursday, October 22, 2009 9:23 AM
To: r-help@r-project.org
Subject: [R] How to calculate the area under the curve


Hi all, 

I would like to calculate the area under the ROC curve for my predictive
model. I have managed to plot points giving me the ROC curve. However, I
do not know how to get the value of the area under. 
Does anybody know of a function that would give the result I want using
an array of specificity and an array of sensitivity as input?

Thanks, 

Olivier
--
View this message in context:
http://www.nabble.com/How-to-calculate-the-area-under-the-curve-tp260105
01p26010501.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to calculate the area under the curve

2009-10-22 Thread Ravi Varadhan

I assume that you have an ordered pair (x, y) data, where x = sensitivity, and 
y = 1 - specificity.  Your `x' values may or may not be equally spaced.  Here 
is how you could solve your problem.  I show this with an example where we can 
compute the area-under the curve exactly:

# Area under the curve
#
# Trapezoidal rule
# x values need not be equally spaced
#
trapezoid <- function(x,y) sum(diff(x)*(y[-1]+y[-length(y)]))/2
#
#
# Simpson's rule when `n' is odd
# Composite Simpson and Trapezoidal rules when `n' is even
# x values must be equally spaced
#
simpson <- function(x, y){
n <- length(y)
odd <- n %% 2
if (odd) area <- 1/3*sum( y[1] + 2*sum(y[seq(3,(n-2),by=2)]) + 
4*sum(y[seq(2,(n-1),by=2)]) + y[n])

if (!odd) area <- 1/3*sum( y[1] + 2*sum(y[seq(3,(n-3),by=2)]) + 
4*sum(y[seq(2,(n-2),by=2)]) + y[n-1]) + 1/2*(y[n-1] + y[n])

dx <- x[2] - x[1]
return(area * dx)
}
#
# An example for AUC calculation
x <- seq(0, 1, length=21)

roc <- function(x, a) x + a * x * (1 - x)

plot(x, roc(x, a=0.5), type="l")
lines(x, roc(x, a=0.8), col=2)
lines(x, roc(x, a=1.2), col=3)
abline(b=1, lty=2)

y <- roc(x, a=1)

trapezoid(x, y)  # exact answer is 2/3

simpson(x, y) # exact answer is 2/3

As you can see the Simpson's rule is more accurate, but the difference should 
not matter in applications, as long as you have sufficient number of points for 
sensitivity and specificity.  Also, note that the improved accuracy of 
Simpson's rule is more fully realized when there are "odd" number of `x' 
values.  If the number of points is even, the trapezoidal rule at the end point 
degrades the accuracy of Simpson approximation.

Hope this helps,
Ravi.



Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: "olivier.abz" <0509...@rgu.ac.uk>
Date: Thursday, October 22, 2009 10:24 am
Subject: [R]  How to calculate the area under the curve
To: r-help@r-project.org


> Hi all, 
> 
> I would like to calculate the area under the ROC curve for my predictive
> model. I have managed to plot points giving me the ROC curve. However, 
> I do
> not know how to get the value of the area under. 
> Does anybody know of a function that would give the result I want 
> using an
> array of specificity and an array of sensitivity as input?
> 
> Thanks, 
> 
> Olivier
> -- 
> View this message in context: 
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> 
> PLEASE do read the posting guide 
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to calculate the area under the curve

2009-10-22 Thread William Dunlap

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of olivier.abz
> Sent: Thursday, October 22, 2009 7:23 AM
> To: r-help@r-project.org
> Subject: [R] How to calculate the area under the curve
> 
> 
> Hi all, 
> 
> I would like to calculate the area under the ROC curve for my 
> predictive
> model. I have managed to plot points giving me the ROC curve. 

If x and y are the coordinates of the edges of the polygon
(e.g., polygon(x,y) would draw the polygon) then the polygon's
signed area is given by the following
area<-function(x,y)sum(x*c(y[-1],y[1]) - c(x[-1],x[1])*y)/2
(Positive if edge is traced counter-clockwise.)

This is a discrete version of Green's theorem.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> However, I do
> not know how to get the value of the area under. 
> Does anybody know of a function that would give the result I 
> want using an
> array of specificity and an array of sensitivity as input?
> 
> Thanks, 
> 
> Olivier
> -- 
> View this message in context: 
> http://www.nabble.com/How-to-calculate-the-area-under-the-curv
e-tp26010501p26010501.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to calculate the area under the curve

2009-10-22 Thread Sebastien Bihorel

Well, you can use the trapezoidal rule to numerically calculate any area under
the curve. I don't know if a specific exists but you could create one. The 
principle is basically to compute the area between two successive points of

your profile with:
AREA=0.5*(Response1 + Response2)/(Time2-Time1)

where time1 and time2 are the time of response1 and response2. You will finally
add all the areas together to obtain the total area between your first and last 
point.


HIH


Hi all, 


I would like to calculate the area under the ROC curve for my predictive
model. I have managed to plot points giving me the ROC curve. However, I do
not know how to get the value of the area under. 
Does anybody know of a function that would give the result I want using an

array of specificity and an array of sensitivity as input?

Thanks, 


Olivier

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to calculate the area under the curve

2009-10-22 Thread Frank E Harrell Jr

olivier.abz wrote:
Hi all, 


I would like to calculate the area under the ROC curve for my predictive
model. I have managed to plot points giving me the ROC curve. However, I do
not know how to get the value of the area under. 
Does anybody know of a function that would give the result I want using an

array of specificity and an array of sensitivity as input?

Thanks, 


Olivier


Olivier,

The ROC curves in my view just get in the way.  They are mainly useful 
in that, almost by accident, the area under the curve equals a nice pure 
discrimination index.  Go for the direct calculation of the ROC area 
based on the Wilcoxon-Mann-Whitney-Somers' Dxy rank correlation 
approach, e.g., using the Hmisc package rcorr.cens package which 
provides Dxy = 2(C-.5) where C = ROC area.  It also provides the S.E. of 
Dxy and thus of C, and generalizes to censored data.  This approach uses 
the raw data, not sensitivity and specificity (which are improper 
scoring rules).  This is assuming you are using an external validation 
dataset.  If this is not the case you will need to use the bootstrap or 
intensive cross-validation, e.g., using the rms package's lrm and 
validate functions.


Also note that it is not usually appropriate to compare two ROC areas 
for choosing a model as this is too insensitive.  It is the same as 
taking the difference between two scaled Wilcoxon statistics which is 
simply not done.


Frank

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm.fit to use LAPACK instead of LINPACK

2009-10-22 Thread Ted

Hi,

I understand that the glm.fit calls LINPACK fortran routines instead of
LAPACK because it can handle the 'rank deficiency problem'.  If my data
matrix is not rank deficient, would a glm.fit function which runs on
LAPACK be faster?  Would this be worthwhile to convert glm.fit to use
LAPACK?  Has anyone done this already??  What is the best way to do this?

I'm looking at very large datasets (thousands of glm calls), and would
like to know if it's worth the effort for performance issues.

Thanks,

Ted

-
Ted Chiang
 Bioinformatics Analyst
 Centre for Computational Biology
 Hospital for Sick Children, Toronto
 416.813.7028
 tchi...@sickkids.ca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to create a new data type

2009-10-22 Thread cls59



Markus Weisner-2 wrote:
> 
> I am working on a new package to do fire department analysis.  I am
> working
> with emergency dispatch data from different agencies that all contain the
> same information but have slightly different formats.  Typically the
> variable names and date-time formats are different.  I am getting pretty
> good at reading in and rearranging the data so that it is essentially in
> the
> same format, but it is common for me to have an occasional wrong variable
> name or data type.  These slight deviations wreak havoc on all my
> functions
> that are setup to automatically analyze the data (and require specific
> variable names and data types).
> 
> I would like to create a new data type that has defined variable names and
> types (where I would be forced to have the data in the correct format). 
> If
> I had my own unique data type, each of my analysis functions could check
> to
> make sure that provided data has the correct data type ... thus
> eliminating
> all these little debugging operations I have to keep doing to get the
> functions to work right.
> 
> Any suggestions on how to do something like this?  I have done some
> research
> online, but could not find any simple explanations of creating user
> defined
> object types.  Thanks in advance for your help.
> 
> Best,
> Markus
> 
> 

Well, you can get this functionality buy defining an S4 class:

  require( methods )
  setClass( 'myClass', representation = representation( from = 'character',
to = 'character', 
number = 'numeric' ),
prototype = prototype( from = character(), to = character(), number = 0
),
validity = function( object ){
 
   if( obj...@number < 0 ){
 
  return("The number must be positive!")

   }

return( TRUE )

}
  )

That will define a class containing three components-- two character vectors
named "from" and "to" and a numeric vector named "number". You can give the
names of any class, such as 'list' or 'data.frame', as components of a S4
class. The prototype specifies that each value will default to a vector of
length 0 except in the case of the number which defaults to 0. The validity
is an optional custom function that performs specific checks on the class
members-- in this case it ensures that the value given for the number is
non-negative. Returning anything but TRUE from the validity function will be
interpreted as an error.

New objects of a given class are created using new():

  myObject <- new( 'MyClass', from = 'foo', to = 'bar', number = 3 )

Components, known as "slots", are accessed using the @ operator in much the
same way that list items are accessed using $:

  myobj...@number

[1] 3


I have to run, but for more info you check the documentation of the methods
package and the R wiki page on S4 has some pointers to other references.

Hope this helps!

-Charlie



-
Charlie Sharpsteen
Undergraduate
Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: 
http://www.nabble.com/how-to-create-a-new-data-type-tp26010194p26011194.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Diffusion of particles inside a sphere

2009-10-22 Thread Ravi Varadhan
Hi,

You need to be a lot more specific about what you want.  Are you trying to 
solve a partial differential equation (PDE) in time and space, in spherical 
coordinates?  Do you have a closed form solution that provides the values of 
the dependent variable (e.g. concentration or temperature) as a function of 
time and location? If so, it is  a simple matter of writing an R function to 
compute that and plot the results or do whatever you want to do with the 
result.  If there is no closed form solution to the PDE, then you have to solve 
it numerically using finite-differencing coupled with a time-marching scheme 
such as Euler's or Runge-Kutta schemes.  

Hope this helps,
Ravi.



Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: carfer...@alum.us.es
Date: Thursday, October 22, 2009 9:45 am
Subject: [R] Diffusion of particles inside a sphere
To: r-help@r-project.org


> Hello veryone,
> 
> I am interested in the diffusion of particles inside a sphere, and its 
> release through a small pore on the sphere surface. Unfortunately, I 
> have not found the way to do this in R. Could you help me?
> 
> Thank very much in advance for your help
> 
> __
> R-help@r-project.org mailing list
> 
> PLEASE do read the posting guide 
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to calculate the area under the curve

2009-10-22 Thread olivier.abz

Hi all, 

I would like to calculate the area under the ROC curve for my predictive
model. I have managed to plot points giving me the ROC curve. However, I do
not know how to get the value of the area under. 
Does anybody know of a function that would give the result I want using an
array of specificity and an array of sensitivity as input?

Thanks, 

Olivier
-- 
View this message in context: 
http://www.nabble.com/How-to-calculate-the-area-under-the-curve-tp26010501p26010501.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Replacing multiple elements in a vector !

2009-10-22 Thread Praveen Surendran
Hi,

 

I have a vector with elements 

 

rs.id=c('rs100','rs101','rs102','rs103')

 

And a dataframe 'snp.id'

 

1  SNP_100  rs100

2  SNP_101  rs101

3  SNP_102  rs102

4  SNP_103  rs103

 

Task is to replace rs.id vector with corresponding 'SNP_' ids  in snp.id.

 

Thanks in advance.

 

Praveen Surendran

2G, Complex and Adaptive Systems Laboratory (UCD CASL)

School of Medicine and Medical Sciences

University College Dublin

Belfield, Dublin 4

Ireland.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sunflowerplot - digits parameter separate for x and y values?

2009-10-22 Thread Rainer M Krug
Hi

I have a sunflowerplot, in which the x and y axis cover different ranges
(different orders of magnitude). Is it possible to specify different
"digits" for the x and y axis?

Rainer


-- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology,
UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Cell:   +27 - (0)83 9479 042
Fax:+27 - (0)86 516 2782
Fax:+49 - (0)721 151 334 888
email:  rai...@krugs.de

Skype:  RMkrug
Google: r.m.k...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Normality test

2009-10-22 Thread Duncan Murdoch

On 10/22/2009 9:48 AM, rkevinbur...@charter.net wrote:

I am having a hard time interpreting the results of the 'shapiro.test' for 
normality. If I do ?shapiro.test I see two examples using rnorm and runif. When 
I run the test using rnorm I get a wide variation of results. Most of this may 
be from variability of rnorm, samll sample size (limited to 5000 for the test), 
etc but if I repeat the test multiple times I can get:


shapiro.test(rnorm(4900, mean = 5, sd = 3))


Shapiro-Wilk normality test

data:  rnorm(4900, mean = 5, sd = 3) 
W = 0.9994, p-value = 0.09123


With a p-value of 0.09 it doesn't give me alot of confidence that either rnorm is 
producing a normal distirbution of this test is very reliable. Obivously this test has 
gained wide acceptance so I was wondering if I am expecting too much? Is there a 
"better" test?



I think you don't understand what p-values mean.  If the null is true, p 
is distributed as U(0,1).


You can see

Murdoch, D.J., Tsai, Y.-L. and Adcock, J. (2008).  P-values are random
variables.  {\em The American Statistician}, 242-245.

for more details (and exceptions to this very general rule).

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Putting names on a ggplot

2009-10-22 Thread John Kane


--- On Tue, 10/20/09, hadley wickham  wrote:

> From: hadley wickham 
> Subject: Re: [R] Putting names on a ggplot
> To: "John Kane" 
> Cc: m...@z107.de, "R R-help" 
> Received: Tuesday, October 20, 2009, 10:59 AM
> On Sun, Oct 18, 2009 at 10:29 AM,
> John Kane 
> wrote:
> > Thanks Stefan, the annotate approach works
> beautifully.  I had not got that far in Hadley's book
> apparently :(
> >
> > I'm not convinced though that the explaination
> >
> >> you shouldn't use aes in this case since nampost,
> >> temprange, ... are not
> >> part of the dataframe year.
> >
> > makes sense since it seems to work in this case unless
> I am missing something obvious. Mind you I'm good at missing
> the obvious especially in ggplot.
> 
> It currently works (because I can't figure out how to make
> it an error) but you really should not do it.
> 
> Hadley

Now you tell me! I was a bit surprised when it worked the first time but I'm 
still stumbling around with ggplot.  I'll put it into my list of things _not_ 
to do.   

I vote for fortunes too. 




  __
Be smarter than spam. See how smart Spa
k on Options in Mail and switch to New Mail today or register for free at 
http://mail.yahoo.ca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >