date:20110519

Hi,

Does the following work for you?

set.seed(123)
d <- data.frame(x = rpois(10, 4), y = rnorm(10))
d
d + 2

See ?within for one more option.

HTH,
Jorge


On Thu, May 19, 2011 at 11:54 PM, Ramya <> wrote:

> Hi there
>
> I just want to add 2 to all the values in dataframe.
>
> I tried using sapply but it seem to die all the time.
>
> Any help would be appreciated.
>
> Thanks
> Ramya
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Adding-a-numeric-to-all-values-in-the-dataframe-tp3537594p3537594.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extraction of mean square value from ANOVA

2011-05-19 Thread Cheryl Johnson

Thanks for your help. How would I extract each of the 3 values in the vector
individually?

Thanks again

On Thu, May 19, 2011 at 10:40 PM, Rolf Turner wrote:

> On 20/05/11 13:46, Cheryl Johnson wrote:
>
>> Hello,
>>
>> I am randomly generating values and then using an ANOVA table to find the
>> mean square value. I would like to form a loop that extracts the mean
>> square
>> value from ANOVA in each iteration. Below is an example of what I am
>> doing.
>>
>> a<-rnorm(10)
>> b<-factor(c(1,1,2,2,3,3,4,4,5,5))
>> c<-factor(c(1,2,1,2,1,2,1,2,1,2))
>>
>> mylm<-lm(a~b+c)
>> anova(mylm)
>>
>> Since I would like to use a loop to generate this several times it would
>> be
>> helpful to know how to extract the mean square value from ANOVA.
>>
>
> anova(mylm)[["Mean Sq"]]
>
> strangely enough. :-)
>
> This gives you a *vector* (of length 3 in your setting), the last
> entry of which is the error (or residual) mean square, which is
> probably what you want since you refer the ``mean square value''
> (singular).
>
>cheers,
>
>Rolf Turner
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Adding a numeric to all values in the dataframe

2011-05-19 Thread Ramya

Hi there

I just want to add 2 to all the values in dataframe.

I tried using sapply but it seem to die all the time. 

Any help would be appreciated.

Thanks
Ramya

--
View this message in context: 
http://r.789695.n4.nabble.com/Adding-a-numeric-to-all-values-in-the-dataframe-tp3537594p3537594.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to do covariate adjustment in R

2011-05-19 Thread karena

Hi, I have a question about how to do covariate adjustment.

I have two sets of 'gene expression' data. They are from two different
tissue types, 'liver' and 'brain', respectively. 
The purpose of my analysis is to compare the pattern of the whole genome
'gene expression' between the two tissue types. 
I have 'age' and 'sex' as covariates. Since 'age' and 'sex' definitely have
influence on gene expression, I need to first filter out the proporation of
'gene expression' attributable to 'age' and 'sex', then compare the
'remaining gene expression' value between 'liver' and 'brain'.
How to do the covariate adjustment to remove the effects of these two
covariates?
Should I do a 'step-wise' regression or something?
Which function in R should I use?

I am new to this field, and really appreciate your help!

thank you very much,

karena

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-do-covariate-adjustment-in-R-tp3537463p3537463.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting the graphics window to a data matrix

2011-05-19 Thread Beutel, Terry S

Thanks Barry. I have now got as far as creating the png (or gif or tiff) images 
(as per your suggestions). But I am unable to work out how to convert each to a 
2 dimensionional numeric matrix showing the spatial distribution of 
occupied/unoccupied pixels in R. Any suggestions anybody? 
Thanks

Terry

-Original Message-
From: b.rowling...@googlemail.com [mailto:b.rowling...@googlemail.com] On 
Behalf Of Barry Rowlingson
Sent: Thursday, 19 May 2011 4:19 PM
To: Beutel, Terry S
Cc: r-help@r-project.org
Subject: Re: [R] Converting the graphics window to a data matrix

On Thu, May 19, 2011 at 5:19 AM, Beutel, Terry S 
 wrote:
> I'm wondering if anyone is aware of a way to take what is visible in 
> the R graphics window, pixelate it, and express that pixellation in a 
> numeric matrix. For example, I am using the following command to 
> create a  black (present) and white (absent) spatial simulation of 
> grass tussock distribution.
>
>>
> symbols(x=runif(100,-.5,1.5),y=runif(100,-.5,1.5),circles=runif(100)/3
> 0,
> inches=F,bg=1,xlim=c(0,1),ylim=c(0,1))
>
> What I would like to end up with is a 2 dimensional matrix that 
> indicates presence/absence of grass in any given pixel as per the 
> image generated from the above graphics plot.

 Instead of drawing to the screen, you can use the 'png' function to create a 
graphics device that "draws" to an image file. Then you can read that image 
file into R and get it as a matrix.

 You probably also want to set some of the margin and axis parameters so the 
plotting region fills the whole device, and you dont have the axes and tick 
marks etc confusing the matrix.

 That's some pointers for starters. I've not had breakfast yet...

Barry

DISCLAIMER
The information contained in the above e-mail message or messages 
(which includes any attachments) is confidential and may be legally 
privileged.  It is intended only for the use of the person or entity 
to which it is addressed.  If you are not the addressee any form of 
disclosure, copying, modification, distribution or any action taken 
or omitted in reliance on the information is unauthorised.  Opinions 
contained in the message(s) do not necessarily reflect the opinions 
of the Queensland Government and its authorities.  If you received 
this communication in error, please notify the sender immediately 
and delete it from your computer system network.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extraction of mean square value from ANOVA

2011-05-19 Thread Bill.Venables

That only applies if you have the same factors a and b each time.  If this is 
the case you can do things in a much more slick way.

u <- matrix(rnorm(5000), nrow = 10)  ## NB, nrow
AB <- expand.grid(a = letters[1:2], b = letters[1:5])
M <- lm(u ~ a+b, AB)
rmsq <- colSums(resid(M)^2)/M$df.resid

and Bob's your uncle.  

If you really want to do it quickly you would bypass lm() altogether and use 
something like ls.fit or, at an even lower level, qr() and qr.resid().  There 
are umpteen ways of fitting linear models.

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Dennis Murphy
Sent: Friday, 20 May 2011 1:38 PM
To: Cheryl Johnson
Cc: r-help@r-project.org
Subject: Re: [R] extraction of mean square value from ANOVA

Hi:

It's easier to use an apply family function than a loop for this type
of problem, as illustrated below:

# Generate 30 random samples of size 10 from a standard
# normal distribution and put them into a matrix
u <- matrix(rnorm(300), ncol = 10)
a <- factor(rep(1:5, each = 2))
b <- factor(rep(1:2, 5))

# Side note: It's not a good idea to name an object 'c' because a
commonly used function in the base package has that name already, as
in c(1, 3, 5)...

# A function to fit the model and to compute the MSE
# deviance() returns the residual sum of squares in a linear model
meansq <- function(y) {
m <- lm(y ~ a + b)
deviance(m)/df.residual(m)
  }

# Apply the function to each row of u; the result is a vector of MSEs
msevec <- apply(u, 1, meansq)
msevec

Note 2: The function works if a and b are in the same environment as
the meansq() function. If you do all of this in the console, there
should be no problem. If you decide to put all of this into a
function, then you need to be more careful.

HTH,
Dennis

On Thu, May 19, 2011 at 6:46 PM, Cheryl Johnson
 wrote:
> Hello,
>
> I am randomly generating values and then using an ANOVA table to find the
> mean square value. I would like to form a loop that extracts the mean square
> value from ANOVA in each iteration. Below is an example of what I am doing.
>
> a<-rnorm(10)
> b<-factor(c(1,1,2,2,3,3,4,4,5,5))
> c<-factor(c(1,2,1,2,1,2,1,2,1,2))
>
> mylm<-lm(a~b+c)
> anova(mylm)
>
> Since I would like to use a loop to generate this several times it would be
> helpful to know how to extract the mean square value from ANOVA.
>
> Thanks
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extraction of mean square value from ANOVA

Hi:

It's easier to use an apply family function than a loop for this type
of problem, as illustrated below:

# Generate 30 random samples of size 10 from a standard
# normal distribution and put them into a matrix
u <- matrix(rnorm(300), ncol = 10)
a <- factor(rep(1:5, each = 2))
b <- factor(rep(1:2, 5))

# Side note: It's not a good idea to name an object 'c' because a
commonly used function in the base package has that name already, as
in c(1, 3, 5)...

# A function to fit the model and to compute the MSE
# deviance() returns the residual sum of squares in a linear model
meansq <- function(y) {
m <- lm(y ~ a + b)
deviance(m)/df.residual(m)
  }

# Apply the function to each row of u; the result is a vector of MSEs
msevec <- apply(u, 1, meansq)
msevec

Note 2: The function works if a and b are in the same environment as
the meansq() function. If you do all of this in the console, there
should be no problem. If you decide to put all of this into a
function, then you need to be more careful.

HTH,
Dennis

On Thu, May 19, 2011 at 6:46 PM, Cheryl Johnson
 wrote:
> Hello,
>
> I am randomly generating values and then using an ANOVA table to find the
> mean square value. I would like to form a loop that extracts the mean square
> value from ANOVA in each iteration. Below is an example of what I am doing.
>
> a<-rnorm(10)
> b<-factor(c(1,1,2,2,3,3,4,4,5,5))
> c<-factor(c(1,2,1,2,1,2,1,2,1,2))
>
> mylm<-lm(a~b+c)
> anova(mylm)
>
> Since I would like to use a loop to generate this several times it would be
> helpful to know how to extract the mean square value from ANOVA.
>
> Thanks
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extraction of mean square value from ANOVA

2011-05-19 Thread Rolf Turner


On 20/05/11 14:51, Cheryl Johnson wrote:
Thanks for your help. How would I extract each of the 3 values in the 
vector individually?


If you are going to use R, it would pay you to learn a little bit about it.
Like, e.g., vector indexing.

In your example

anova(mylm)[["Mean Sq"]][3]

would give the residual mean square.

More perspicuously:

mnsq <- anova(mylm)[["Mean Sq"]] # A vector of length 3
MSE <- mnsq[3] # Mean square for error.
MSB <- mnsq[1] # Mean square for factor b
MSC <- mnsq[2] # Mean square for factor c

Do read ``An Introduction to R'' as the Posting Guide suggests.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extraction of mean square value from ANOVA

2011-05-19 Thread Rolf Turner


On 20/05/11 13:46, Cheryl Johnson wrote:

Hello,

I am randomly generating values and then using an ANOVA table to find the
mean square value. I would like to form a loop that extracts the mean square
value from ANOVA in each iteration. Below is an example of what I am doing.

a<-rnorm(10)
b<-factor(c(1,1,2,2,3,3,4,4,5,5))
c<-factor(c(1,2,1,2,1,2,1,2,1,2))

mylm<-lm(a~b+c)
anova(mylm)

Since I would like to use a loop to generate this several times it would be
helpful to know how to extract the mean square value from ANOVA.


anova(mylm)[["Mean Sq"]]

strangely enough. :-)

This gives you a *vector* (of length 3 in your setting), the last
entry of which is the error (or residual) mean square, which is
probably what you want since you refer the ``mean square value''
(singular).

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] extraction of mean square value from ANOVA

2011-05-19 Thread Cheryl Johnson

Hello,

I am randomly generating values and then using an ANOVA table to find the
mean square value. I would like to form a loop that extracts the mean square
value from ANOVA in each iteration. Below is an example of what I am doing.

a<-rnorm(10)
b<-factor(c(1,1,2,2,3,3,4,4,5,5))
c<-factor(c(1,2,1,2,1,2,1,2,1,2))

mylm<-lm(a~b+c)
anova(mylm)

Since I would like to use a loop to generate this several times it would be
helpful to know how to extract the mean square value from ANOVA.

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Anyone successfully install Rgraphviz on windows with R 2.13?

2011-05-19 Thread Gabor Grothendieck

On Thu, May 19, 2011 at 4:28 PM, Michael Conklin
 wrote:
> I have been trying to get Rgraphviz to work (I know it is from Bioconductor) 
> unsuccessfully. Since I have no experience with Bioconductor I thought I 
> would ask here if anyone has advice. I have installed Graphviz 2.20.3 as is 
> recommended on the Bioconductor site but basically R cannot seem to find the 
> needed dll files.  So, even though I have added the appropriate directories 
> to the system path R cannot seem to find them. Any tips would be appreciated.
>

Be sure to read the installation instructions.  Unfortunately they
really hid them. You have to download and detar the source package and
then look at the README in it.

Regarding the path, I have graphviz installed in C:\Program
Files\Graphviz2.20 on my Windows Vista system  yet it works so at
least on Windows I don't think it matters that there are spaces in the
path.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A better way to do this

Hi:

Could you post a small, reproducible data set that illustrates what
you want to do? It sounds like you're creating 'spaghetti plots',
which can be done with a minimal amount of pain in ggplot2.

Dennis

On Thu, May 19, 2011 at 11:29 AM, 1Rnwb  wrote:
> Hello gurus,
>
> I have a dataframe containing two groups viz., 'control' and 'case', each of
> these groups contains longitudinal data for 100 subjects. I have to plot all
> these subjects on a single chart and then put a regression line for each of
> the group for all the subjects. I have written a function to do the chart
> grpcharts<-function (dat, group,group2,molecule,cutoff){
> dat2<-dat[grep(group,dat$Group),]
> ylim=log2(c(min(dat2[,molecule],na.rm=T)+4,max(dat2[,molecule],na.rm=T)+1))
> all.sub.id<-dat2$Subject.ID
>  if (group=='control'){
>         col=c('blue')
> }else{col=c('red')}
>
> if(group2=='case'){
>        col2=c('red')
> }else{ col2=c('blue')}
> uniq.sub.id<-unique(all.sub.id)
>                 errcol<-c()
>                        for (i in 1: length(uniq.sub.id))
>                        {
>                         sub<-dat2[grep(uniq.sub.id[i],dat2$Subject.ID),]
>                         sub<- sub[order(sub$Age.at.Sample.Collection),]
>                         sub<-sub[sub[,molecule]>cutoff,]
>                         sub.id<-uniq.sub.id[i]
>                         if (dim(sub)[1]<=1) errcol<-c(errcol, sub.id)
>                         if (dim(sub)[1]>1)
>                                {
>                                 par(new=TRUE)
>                                 
> plot(log2(sub[,molecule])~sub$Age.at.Sample.Collection,
> ylab=paste('Log2_',molecule,sep=''),xlab="Age at Sample",pch=1, ylim=ylim,
> xlim=c(0,25),main=paste(group,'-',molecule))
>                                 
> mod<-loess(log2(sub[,molecule])~Age.at.Sample.Collection,
> na.action=na.exclude, data=sub)
>                                 pred<-predict(mod)
>                                 lines(pred~Age.at.Sample.Collection, 
> data=sub,lwd=1, lty=1)
>                                 }
>                        }
>                   dat2<-dat2[order(dat2$Age.at.Sample.Collection),]
>                   mod<-loess(log2(dat2[,molecule])~Age.at.Sample.Collection,
> na.action=na.exclude, data=dat2)
>                   pred<-predict(mod)
>                   lines(pred~Age.at.Sample.Collection, data=dat2,lwd=2, 
> lty=1,col=col)
>                   dat2<-dat[grep(group2,dat$Group),]
>                   dat2<-dat2[order(dat2$Age.at.Sample.Collection),]
>                   mod<-loess(log2(dat2[,molecule])~Age.at.Sample.Collection,
> na.action=na.exclude, data=dat2)
>                   pred<-predict(mod)
>                   lines(pred~Age.at.Sample.Collection, data=dat2,lwd=2, 
> lty=1,col=col2)
>                   legend(c(20,20),c(ylim),c(group,group2), 
> lty=1,lwd=2,col=c(col,col2),
> bty='n')
>                  print('done')
>        }
>
> the function subsets the data based on the 'group' and plots the datapoints
> currently I am using a loop to assign the colors under two conditions. I
> need some pointers to assign the colors to the regression lines for the two
> groups without using a loop.
> thanks
> sharad
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/A-better-way-to-do-this-tp3536576p3536576.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Anyone successfully install Rgraphviz on windows with R 2.13?

2011-05-19 Thread Paul Murrell


Hi

In addition to getting the graphviz version EXACTLY right (as per the 
README file), and ensuring that the right paths are set up, I also 
seemed to need to install graphviz in a path WITHOUT spaces (but that 
may just have been superstition)


Paul

On 20/05/2011 8:28 a.m., Michael Conklin wrote:

I have been trying to get Rgraphviz to work (I know it is from
Bioconductor) unsuccessfully. Since I have no experience with
Bioconductor I thought I would ask here if anyone has advice. I have
installed Graphviz 2.20.3 as is recommended on the Bioconductor site
but basically R cannot seem to find the needed dll files.  So, even
though I have added the appropriate directories to the system path R
cannot seem to find them. Any tips would be appreciated.

W. Michael Conklin Chief Methodologist Google Voice: (612) 56STATS

MarketTools, Inc. | www.markettools.com
6465 Wayzata Blvd | Suite 170 |  St. Louis Park, MN 55426.  PHONE:
952.417.4719 | CELL: 612.201.8978 This email and attachment(s) may
contain confidential and/or proprietary information and is intended
only for the intended addressee(s) or its authorized agent(s). Any
disclosure, printing, copying or use of such information is strictly
prohibited. If this email and/or attachment(s) were received in
error, please immediately notify the sender and delete all copies


[[alternative HTML version deleted]]

__ R-help@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
p...@stat.auckland.ac.nz
http://www.stat.auckland.ac.nz/~paul/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] changes in coxph in "survival" from older version?

2011-05-19 Thread Frank Harrell

Hi Tao,

For you situation (and even MUCH larger number of events), multivariable
modeling will be unreliable unless you use shrinkage, variable selection
will select the wrong variables, and univariable screening leads to massive
bias in later stages.

Terry converted me from SAS to S-Plus in 1991 when I visited Mayo Clinic and
he showed me how natural the language was to put a loop around the kind of
stepwise analyses requested by users.  The bootstrap showed that the list of
predictors selected was very random.

Another demonstration of this is to bootstrap the ranks of the predictors,
ranked by any measure you want (adjusted chi-square, univariable chi-square,
ROC area).  The confidence intervals for the ranks will be extremely wide.
Frank


Shi, Tao wrote:
> 
> Thank you, Frank and Terry, for all your answers!  I'll upgrade my
> "survival" 
> package for sure!
> 
> It seems to me that you two are pointing to two different issues: 1) Is
> stepwise 
> model selection a good approach (for any data)?  2) Whether the data I
> have has 
> enough information that even worth to model?  For #1, I'm not in a good
> position 
> to judge and need to read up on it.  For #2, I'm still a bit confused
> about 
> Terry's last comment.  If we forget about multivariate model building and
> just 
> look at variable one by one and select the best predictor (let's say it's
> highly 
> significant, e.g. p<0.0001), the resulting univariate model still can be
> wrong?
> 
> What if I use this data as a validation set to validate an existing model?  
> Anything different?
> 
> Many thanks!
> 
> ...Tao
>
> 
> 
> 
> - Original Message 
>> From: Frank Harrell 
>> To: r-help@r-project.org
>> Sent: Tue, May 17, 2011 10:51:02 AM
>> Subject: Re: [R] changes in coxph in "survival" from older version?
>> 
>> It's worse if the model does converge because then you don't have a 
>> warning
>> about the result being nonsense.
>> Frank
>> 
>> 
>> Terry Therneau-2  wrote:
>> > 
>> > -- begin included message ---
>> > I did realize that  there are way more predictors in the model.  My
>> > initial thinking  was use that as an initial model for stepwise model
>> > selection.  Now  I  wonder if the model selection result is still valid
>> > if the  initial model didn't even converge?
>> > --- end inclusion ---
>> > 
>> > You have 17 predictors with only 22 events.  All methods of  "variable
>> > selection" in such a scenario will give essentially random  results.
>> > There is simply not enough information present to determine a  best
>> > predictor or best subset of predictors.  
>> > 
>> >  Terry Therneau
>> > 
>> >  __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the  posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and  provide commented, minimal, self-contained, reproducible code.
>> > 
>> 
>> 
>> -
>> Frank Harrell
>> Department of Biostatistics, Vanderbilt  University
>> --
>> View this message in context:  
>>http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3530024.html
>>
>> Sent  from the R help mailing list archive at Nabble.com.
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting  guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented,  minimal, self-contained, reproducible code.
>>
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3537322.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] changes in coxph in "survival" from older version?

2011-05-19 Thread Shi, Tao

Thank you, Frank and Terry, for all your answers!  I'll upgrade my "survival" 
package for sure!

It seems to me that you two are pointing to two different issues: 1) Is 
stepwise 
model selection a good approach (for any data)?  2) Whether the data I have has 
enough information that even worth to model?  For #1, I'm not in a good 
position 
to judge and need to read up on it.  For #2, I'm still a bit confused about 
Terry's last comment.  If we forget about multivariate model building and just 
look at variable one by one and select the best predictor (let's say it's 
highly 
significant, e.g. p<0.0001), the resulting univariate model still can be wrong?

What if I use this data as a validation set to validate an existing model?  
Anything different?

Many thanks!

...Tao

- Original Message 
> From: Frank Harrell 
> To: r-help@r-project.org
> Sent: Tue, May 17, 2011 10:51:02 AM
> Subject: Re: [R] changes in coxph in "survival" from older version?
> 
> It's worse if the model does converge because then you don't have a  warning
> about the result being nonsense.
> Frank
> 
> 
> Terry Therneau-2  wrote:
> > 
> > -- begin included message ---
> > I did realize that  there are way more predictors in the model.  My
> > initial thinking  was use that as an initial model for stepwise model
> > selection.  Now  I  wonder if the model selection result is still valid
> > if the  initial model didn't even converge?
> > --- end inclusion ---
> > 
> > You have 17 predictors with only 22 events.  All methods of  "variable
> > selection" in such a scenario will give essentially random  results.
> > There is simply not enough information present to determine a  best
> > predictor or best subset of predictors.  
> > 
> >  Terry Therneau
> > 
> >  __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the  posting guide
> > http://www.R-project.org/posting-guide.html
> > and  provide commented, minimal, self-contained, reproducible code.
> > 
> 
> 
> -
> Frank Harrell
> Department of Biostatistics, Vanderbilt  University
> --
> View this message in context:  
>http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3530024.html
>
> Sent  from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting  guide http://www.R-project.org/posting-guide.html
> and provide commented,  minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmer with 2 random effects with only two levels

2011-05-19 Thread Daniel Malter

What you should and can do depends on the expectations you have regarding the
correlation structure of the data and is limited by your degrees of freedom.
For example, what is wrong about:

reg<-lmer(response~femaleset+treatment+(1|group))
 
?

This assumes that there are constant group effects on a group's responses
(i.e., it allows for differences in mean responses between groups) and that
this constant group effect is independent of the effect of femaleset and
treatment on a group.

Best,
Daniel


--
View this message in context: 
http://r.789695.n4.nabble.com/lmer-with-2-random-effects-with-only-two-levels-tp3536791p3537184.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] not using attach() but in one case ....

2011-05-19 Thread Bill.Venables

>>> Martin Maechler writes: 
> 
> Well, then you don't know  *THE ONE* case where modern users of
> R should use attach() ... as I have been teaching for a while,
> but seem not have got enought students listening ;-) ...
> 
>   ---  Use it instead of load()  {for save()d R objects} ---
> 
> The advantage of attach() over load() there is that loaded
> objects (and there maye be a bunch!), are put into a separate
> place in the search path and will not accidentally overwrite
> objects in the global "workspace". 
> 
> Of course, there are still quite a few situations {e.g. in
> typical BATCH use of R for simulations, or Sweaving, etc} where
> load() is good enough, and the extras of using attach() are not
> worth it.
> 
> But the unconditional  "do not use attach()" 
> is not quite ok,
> at least not when you talk to non-beginners.
> 
> Martin Maechler, ETH Zurich

Curiously this is why I wrote the SOAR package.  Instead of save() you
use Store() (Frank Harrell had already taken 'Save') and instead of
attach() use Attach().  The objects are saved as separate rda files in
a special subdirectory and when Attach() is used on it they are placed
on the search path, normally at position 2, as promises.  The global
environment is kept relatively free of extraneous objects (if you want
to work like that) and the operation is fast and very low on memory
use, (unless you want to use every object you see all at once, of
course).

The "track" package from Tony Plate achieves somewhat similar results
but with a different method.  Essentially it implements something very
like the old S-PLUS style of keeping everything out of memory as files
in a .Data subdirectory, (although Tony uses the name 'rdatadir'),
unless actually in use.

Bill Venables.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] K-M probability for each subject



On May 19, 2011, at 5:20 PM, hsu ya-hui wrote:


Dear R-user,

I would like to get the survival probability (surv) for each subject  
id.

That is, I want to have additional column surv as follows:


The survival probability for each subject is 1 until death and 0  
thereafter.




id  OS  OS_DUR  surv



eg:
fitKM<-survfit(Surv(OS_DUR,OS)~1,data=data)
fitKM$surv # this will only give me survival probability and I do  
not know

the corresponding id

Any one knows how to get the probability?


What do you really want?

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting Variable Name into String

2011-05-19 Thread Paolo Rossi

Hi Bert,

thanks for your advice on lapply which I am still not very familair with.
With regard from the issue of the names I realised that I can simply call
names on the data.frame and then join them with other strings through paste
and add them to the data.frame by using the [] syntax rather than the $


Thanks

Paolo



On 19 May 2011 16:53, Bert Gunter  wrote:

> 1. This won't work. The lagged variables have length one less than the
> originals.
>
> 2. How about:
>
> lagged_Q <- data.frame( lapply( QuarterlyData,diff))
>
> You can then change the names in lagged_Q to something like
> lagged_originalName via paste() if you like.
>
> 3. I strongly suspect that none of this is necessary or wise: R  has
> numerous time series modeling and graphical capabilities that handle
> this automatically. I suggest you first  check the time series Task
> View on CRAN to see if something there already does what you want.
>
> -- Bert
>
> On Thu, May 19, 2011 at 8:05 AM, Paolo Rossi
>  wrote:
> > Hello,
> >
> > I would like to create lagged and delta variables from a set of variables
> > and then add them to a dataframe
> >
> > Suppose that GDPPcSa is a variable. I would like to be able to do this
> >
> > QuarterlyData$D1GdpPcSa = diff(GDPPcSa , 1)
> >
> > in an automated fashion so that I loop over Quartely data to compute the
> > first difference  of its variables and add them to the dataframe.
> >
> > .It would be great to get a way to create the string "D1GdpPcSa" knowing
> > that the name of the var is GdpPcSa. Then I can add the varibale
>  D1GdpPcSa
> > to the dataframe and work on its names attribute.
> >
> > Thanks
> >
> > Paolo
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> "Men by nature long to get on to the ultimate truths, and will often
> be impatient with elementary studies or fight shy of them. If it were
> possible to reach the ultimate truths without the elementary studies
> usually prefixed to them, these would not be preparatory studies but
> superfluous diversions."
>
> -- Maimonides (1135-1204)
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Separating boot results

Hmmm...  I am sorry Patrick, it should have been

*
 a <- codboot[c(4)]
round(a$bca[4:5], 2)   # note I replaced "," by ":"

Best,
Jorge

*

>
> On Thu, May 19, 2011 at 3:08 PM, Patrick Santoso <> wrote:
>
>>  Thanks for the suggestion Jorge!
>>
>>  - that gives me subscript out of bounds error. perhaps I can tweak the
>> parameters? I've never worked with that command.
>>
>> Pat
>>
>>  On Thu, May 19, 2011 at 2:12 PM, Jorge Ivan Velez <> wrote:
>>
>>> Hi Patrick,
>>>
>>> How about this (untested)?
>>>
>>> a <- codboot[c(4)]
>>> round(a$bca[4, 5], 2)
>>>
>>> HTH,
>>> Jorge
>>>
>>>
>>> On Thu, May 19, 2011 at 7:20 AM, Patrick Santoso <> wrote:
>>>
 Good Morning,

 I'm having what I hope to be a simple problem. I am generating bootstrap
 confidence intervals using package (boot) - which works perfectly. The
 issue
 I am having is getting the results into a format which I can write out
 to a
 database. To be clear I am having no problems generating the results, I
 just
 need to convert the format of the results such that I can store the
 results
 in a dataframe to save out to a database.

 I am doing the following:

 ## Generate 20,000 bootstrap samples
 cod.boot <- boot(trimmed$ratio, cod, R=2)

 ## generate 90% BCA boostrap confidence intervals
 codboot <- boot.ci(cod.boot,conf = c(0.90), type= c("bca"))

 At this point I have stored the answer I want to the variable codboot,
 but
 the problem is it is in the format as follows:

 BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
 Based on 2 bootstrap replicates

 CALL :
 boot.ci(boot.out = cod.boot, conf = c(0.9), type = c("bca"))

 Intervals :
 Level   BCa
 90%   ( 6.10, 10.23 )
 Calculations and Intervals on Original Scale

 What I would like is the 6.10, and 10.23 each stored in their own
 variables
 so I can combine them into my existing dataframe (using cbind). The best
 I've been able to do so far is:

 a <- codboot[c(4)]

 which gives me a:

 $bca
 conf
 [1,]  0.9 2343.41 19683.87 6.099788 10.23007

 which is closer, but I cannot parse this variable enough to get the 6.10
 and
 10.23 out since apparently strsplit doesn't allow splitting on spaces.

 Any help would be greatly appreciated!

 Best Regards & Thank You in Advance

 Patrick

 --

 Patrick Santoso

 University of New Hampshire

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

>>>
>>>
>>
>>
>> --
>>
>> Patrick Santoso
>>
>> 603 969 4673
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] recursive function

2011-05-19 Thread Robert Baer


Perhaps this is useful:

x=c(-2,0,2)
sign(x)*abs(x)

[1] -2  0  2


--
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine
A. T. Still University of Health Sciences
800 W. Jefferson St.
Kirksville, MO 63501
660-626-2322
FAX 660-626-2965



--
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine
A. T. Still University of Health Sciences
800 W. Jefferson St.
Kirksville, MO 63501
660-626-2322
FAX 660-626-2965


--
From: "Tremblay, Pierre-Olivier" 
Sent: Thursday, May 19, 2011 2:42 PM
To: 
Subject: [R] recursive function


Hi,

I created a function for obtaining the normal cumulative distribution (I
know all this already exists in R, I just wanted to verify my
understanding of it).  below is the code I came up with.

cdf<-function(x) {
   erf<-function(x) {
  # approximation to the error function (erf) of the
  # normal cumulative distribution function
  # from Winitzki (2008)
 a   <- 0.147
 partc   <- 1+a*x^2
 partb   <- 4/pi+a*x^2
 parta   <- -x^2*(partb/partc)
 erf <- sqrt(1-exp(parta))
 erf
}
   #  cumulative density function
 cdf<- 1/2*(1+erf(x/sqrt(2)))
 cdf
   }

The erf(x) produces positive values whatever the sign of x.  Instead of
obtaining the expected sigmoid shape of the erf() I end up with a
positive values V shape.  erf(x) should be negative for negative values
of x.  I figure I need some form of conditional statement in the above
code but all my attempts at using "if" and "for" failed miserably.
Clearly, I don't understand how to use them properly in R.  If anyone
could point me in the right direction as on how to code conditional
events in R that would be much appreciated.

many thanks in advance

Pierre-Olivier


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] recursive function

2011-05-19 Thread Rolf Turner



(1) What has this to do with recursion?

(2) You probably need to use ifelse().  I believe that this is
(in effect) an FAQ.

cheers,

Rolf Turner

On 20/05/11 07:42, Tremblay, Pierre-Olivier wrote:

Hi,

I created a function for obtaining the normal cumulative distribution (I
know all this already exists in R, I just wanted to verify my
understanding of it).  below is the code I came up with.

cdf<-function(x) {
 erf<-function(x) {
# approximation to the error function (erf) of the
# normal cumulative distribution function
# from Winitzki (2008)
   a<- 0.147
   partc<- 1+a*x^2
   partb<- 4/pi+a*x^2
   parta<- -x^2*(partb/partc)
   erf<- sqrt(1-exp(parta))
   erf
  }
 #  cumulative density function
   cdf<- 1/2*(1+erf(x/sqrt(2)))
   cdf
 }

The erf(x) produces positive values whatever the sign of x.  Instead of
obtaining the expected sigmoid shape of the erf() I end up with a
positive values V shape.  erf(x) should be negative for negative values
of x.  I figure I need some form of conditional statement in the above
code but all my attempts at using "if" and "for" failed miserably.
Clearly, I don't understand how to use them properly in R.  If anyone
could point me in the right direction as on how to code conditional
events in R that would be much appreciated.

many thanks in advance


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to get rid of the index in result file

Hi Yighua,

Try

> m <- 2.35343
> m
[1] 2.35343
> cat(m)
2.35343

HTH,
Jorge


On Thu, May 19, 2011 at 2:35 PM, Hu, Yinghua <> wrote:

> Hi,
>
> I am running some function in Ubuntu command line and get some problem. I
> used some command like below
>
> $ R --slave -vanilla < my_infile > my_outfile
>
> The return should be one number in the my_outfile. In the my_infile, I call
> one extra function and it has code like "return x".
>
> However, in the output file my_outfile
>
> It shows
>
> [1] 2.35343
>
> How do I get only
>
> 2.35343
>
> I mean to get rid of "[1]" part. Thanks a lot!
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to get rid of the index in result file

2011-05-19 Thread Hu, Yinghua

Hi,

I am running some function in Ubuntu command line and get some problem. I used 
some command like below

$ R --slave -vanilla < my_infile > my_outfile

The return should be one number in the my_outfile. In the my_infile, I call one 
extra function and it has code like "return x".

However, in the output file my_outfile

It shows

[1] 2.35343

How do I get only

2.35343

I mean to get rid of "[1]" part. Thanks a lot!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mediation

2011-05-19 Thread Андрей Гончар

You can see the whole function by printing its name in console.

Example:

> plot
function (x, y, ...)
{
if (is.function(x) && is.null(attr(x, "class"))) {
if (missing(y))
y <- NULL
hasylab <- function(...) !all(is.na(pmatch(names(list(...)),
"ylab")))
if (hasylab(...))
plot.function(x, y, ...)
else plot.function(x, y, ylab = paste(deparse(substitute(x)),
"(x)"), ...)
}
else UseMethod("plot")
}


2011/5/19 Mervi Virtanen :
>
>
> Hi!
>
> I try to find out how bootstrap mediation work! How I coud see the whole
> function or algorithm? (rather whole bootstrap).
>
>
> t.Mete
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 

Андрей Гончар

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] K-M probability for each subject

2011-05-19 Thread hsu ya-hui

Dear R-user,

I would like to get the survival probability (surv) for each subject id.
That is, I want to have additional column surv as follows:

id  OS  OS_DUR  surv



eg:
fitKM<-survfit(Surv(OS_DUR,OS)~1,data=data)
fitKM$surv # this will only give me survival probability and I do not know
the corresponding id

Any one knows how to get the probability?

Thanks a lot,

Kate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] trouble with summary tables with several variables using aggregate function

2011-05-19 Thread Luma R

Thank you all for the help and solutions presented!
Luisa

On Thu, May 19, 2011 at 9:33 PM, Dennis Murphy  wrote:

> Oops, didn't see Marc's reply. His solution is much more compact. For
> R 2.11.0 and above, aggregate() now has a formula interface that
> usually works nicely:
>
> aggregate(Var3 ~ Var1 + Var2, data = d, FUN = table)
>   Var1 Var2 Var3.D Var3.I
> 1   S1   T1  2  2
> 2   S2   T1  2  2
> 3   S1   T2  2  2
> 4   S2   T2  0  4
>
> Dennis
>
> On Thu, May 19, 2011 at 12:29 PM, Dennis Murphy  wrote:
> > Hi:
> >
> > The dummy column really isn't necessary. Here's another way to get the
> > result you want. Let d be the name of your example data frame.
> >
> > d <- d[, 1:3]
> > (dtable <- as.data.frame(ftable(d, row.vars = c(1, 2
> >  Var1 Var2 Var3 Freq
> > 1   S1   T1D2
> > 2   S2   T1D2
> > 3   S1   T2D2
> > 4   S2   T2D0
> > 5   S1   T1I2
> > 6   S2   T1I2
> > 7   S1   T2I2
> > 8   S2   T2I4
> >
> > An alternative to the reshape() function is the reshape2 package,
> > which has a function dcast() that allows you to rearrange the data
> > frame as you desire.
> >
> > library(reshape2)
> > dcast(dtable, Var1 + Var2 ~ Var3)
> > Using Freq as value column: use value_var to override.
> >  Var1 Var2 D I
> > 1   S1   T1 2 2
> > 2   S1   T2 2 2
> > 3   S2   T1 2 2
> > 4   S2   T2 0 4
> >
> >
> > HTH,
> > Dennis
> >
> > On Thu, May 19, 2011 at 2:13 AM, Luma R  wrote:
> >> Dear all,
> >>
> >> I am having trouble creating summary tables using aggregate function.
> >>
> >> given the following table:
> >>
> >>
> >> Var1   Var2Var3   dummy
> >> S1   T1 I 1
> >> S1   T1 I 1
> >> S1   T1 D1
> >> S1   T1 D1
> >> S1   T2 I 1
> >> S1   T2 I 1
> >> S1   T2 D1
> >> S1   T2 D1
> >> S2   T1 I 1
> >> S2   T1 I 1
> >> S2   T1 D1
> >> S2   T1 D1
> >> S2   T2 I 1
> >> S2   T2 I 1
> >> S2   T2 I1
> >> S2   T2 I1
> >>
> >>
> >> I want to create a summary table that shows for each category of Var1,
> >> Var2, the number of cells that are Var3=D and Var3-I :
> >>
> >> Var1 Var2  Var3(D)   Var3(I)
> >> S1 T12  2
> >> S1 T22  2
> >> S2 T12  2
> >> S2 T20  4
> >>
> >>
> >>
> >> However, if I do: Count.Cells=  aggregate(dummy~ Var1+Var2+Var3,
> FUN='sum')
> >> , I get:
> >>
> >>   Var1 Var2  Var3 Count of Resp
> >>S1 T1 D2
> >>S1 T1 I  2
> >>S1 T2 D2
> >>S1 T2 I 2
> >>S2 T1 D   2
> >>S2  T1I2
> >>S2 T2 I4
> >>
> >>
> >> Is there a way to get different columns for each Var3 level?
> >>
> >>
> >> Thank you for any help you can give!
> >>
> >>[[alternative HTML version deleted]]
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] A better way to do this

2011-05-19 Thread 1Rnwb

Hello gurus,

I have a dataframe containing two groups viz., 'control' and 'case', each of
these groups contains longitudinal data for 100 subjects. I have to plot all
these subjects on a single chart and then put a regression line for each of
the group for all the subjects. I have written a function to do the chart
grpcharts<-function (dat, group,group2,molecule,cutoff){
dat2<-dat[grep(group,dat$Group),]
ylim=log2(c(min(dat2[,molecule],na.rm=T)+4,max(dat2[,molecule],na.rm=T)+1))
all.sub.id<-dat2$Subject.ID
 if (group=='control'){
 col=c('blue')
}else{col=c('red')}

if(group2=='case'){
col2=c('red')
}else{ col2=c('blue')}
uniq.sub.id<-unique(all.sub.id)
 errcol<-c()
for (i in 1: length(uniq.sub.id)) 
{
 sub<-dat2[grep(uniq.sub.id[i],dat2$Subject.ID),]
 sub<- sub[order(sub$Age.at.Sample.Collection),]
 sub<-sub[sub[,molecule]>cutoff,]
 sub.id<-uniq.sub.id[i]
 if (dim(sub)[1]<=1) errcol<-c(errcol, sub.id)
 if (dim(sub)[1]>1) 
{
 par(new=TRUE)
 
plot(log2(sub[,molecule])~sub$Age.at.Sample.Collection,
ylab=paste('Log2_',molecule,sep=''),xlab="Age at Sample",pch=1, ylim=ylim,
xlim=c(0,25),main=paste(group,'-',molecule))
 
mod<-loess(log2(sub[,molecule])~Age.at.Sample.Collection,
na.action=na.exclude, data=sub)
 pred<-predict(mod)
 lines(pred~Age.at.Sample.Collection, 
data=sub,lwd=1, lty=1)
 }
}
   dat2<-dat2[order(dat2$Age.at.Sample.Collection),]
   mod<-loess(log2(dat2[,molecule])~Age.at.Sample.Collection,
na.action=na.exclude, data=dat2)
   pred<-predict(mod)
   lines(pred~Age.at.Sample.Collection, data=dat2,lwd=2, 
lty=1,col=col)
   dat2<-dat[grep(group2,dat$Group),]
   dat2<-dat2[order(dat2$Age.at.Sample.Collection),]
   mod<-loess(log2(dat2[,molecule])~Age.at.Sample.Collection,
na.action=na.exclude, data=dat2)
   pred<-predict(mod)
   lines(pred~Age.at.Sample.Collection, data=dat2,lwd=2, 
lty=1,col=col2)
   legend(c(20,20),c(ylim),c(group,group2), 
lty=1,lwd=2,col=c(col,col2),
bty='n')
  print('done') 
}

the function subsets the data based on the 'group' and plots the datapoints
currently I am using a loop to assign the colors under two conditions. I
need some pointers to assign the colors to the regression lines for the two
groups without using a loop.
thanks
sharad


--
View this message in context: 
http://r.789695.n4.nabble.com/A-better-way-to-do-this-tp3536576p3536576.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] recursive function

2011-05-19 Thread Tremblay, Pierre-Olivier

Hi,

I created a function for obtaining the normal cumulative distribution (I
know all this already exists in R, I just wanted to verify my
understanding of it).  below is the code I came up with.

cdf<-function(x) {  
erf<-function(x) {
   # approximation to the error function (erf) of the 
   # normal cumulative distribution function
   # from Winitzki (2008)
  a   <- 0.147   
  partc   <- 1+a*x^2
  partb   <- 4/pi+a*x^2
  parta   <- -x^2*(partb/partc)
  erf <- sqrt(1-exp(parta)) 
  erf 
 }
#  cumulative density function
  cdf<- 1/2*(1+erf(x/sqrt(2)))  
  cdf 
}

The erf(x) produces positive values whatever the sign of x.  Instead of
obtaining the expected sigmoid shape of the erf() I end up with a
positive values V shape.  erf(x) should be negative for negative values
of x.  I figure I need some form of conditional statement in the above
code but all my attempts at using "if" and "for" failed miserably.
Clearly, I don't understand how to use them properly in R.  If anyone
could point me in the right direction as on how to code conditional
events in R that would be much appreciated.

many thanks in advance

Pierre-Olivier 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Separating boot results

2011-05-19 Thread Patrick Santoso

Thanks for the suggestion Jorge!

 - that gives me subscript out of bounds error. perhaps I can tweak the
parameters? I've never worked with that command.

Pat
On Thu, May 19, 2011 at 2:12 PM, Jorge Ivan Velez
wrote:

> Hi Patrick,
>
> How about this (untested)?
>
> a <- codboot[c(4)]
> round(a$bca[4, 5], 2)
>
> HTH,
> Jorge
>
>
> On Thu, May 19, 2011 at 7:20 AM, Patrick Santoso <> wrote:
>
>> Good Morning,
>>
>> I'm having what I hope to be a simple problem. I am generating bootstrap
>> confidence intervals using package (boot) - which works perfectly. The
>> issue
>> I am having is getting the results into a format which I can write out to
>> a
>> database. To be clear I am having no problems generating the results, I
>> just
>> need to convert the format of the results such that I can store the
>> results
>> in a dataframe to save out to a database.
>>
>> I am doing the following:
>>
>> ## Generate 20,000 bootstrap samples
>> cod.boot <- boot(trimmed$ratio, cod, R=2)
>>
>> ## generate 90% BCA boostrap confidence intervals
>> codboot <- boot.ci(cod.boot,conf = c(0.90), type= c("bca"))
>>
>> At this point I have stored the answer I want to the variable codboot, but
>> the problem is it is in the format as follows:
>>
>> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
>> Based on 2 bootstrap replicates
>>
>> CALL :
>> boot.ci(boot.out = cod.boot, conf = c(0.9), type = c("bca"))
>>
>> Intervals :
>> Level   BCa
>> 90%   ( 6.10, 10.23 )
>> Calculations and Intervals on Original Scale
>>
>> What I would like is the 6.10, and 10.23 each stored in their own
>> variables
>> so I can combine them into my existing dataframe (using cbind). The best
>> I've been able to do so far is:
>>
>> a <- codboot[c(4)]
>>
>> which gives me a:
>>
>> $bca
>> conf
>> [1,]  0.9 2343.41 19683.87 6.099788 10.23007
>>
>> which is closer, but I cannot parse this variable enough to get the 6.10
>> and
>> 10.23 out since apparently strsplit doesn't allow splitting on spaces.
>>
>> Any help would be greatly appreciated!
>>
>> Best Regards & Thank You in Advance
>>
>> Patrick
>>
>>
>>
>>
>>
>> --
>>
>> Patrick Santoso
>>
>> University of New Hampshire
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>


-- 

Patrick Santoso

603 969 4673

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lmer with 2 random effects with only two levels

2011-05-19 Thread gmd

Dear all,

I am analysing a data set based on 6 groups of individuals. Each group is
observed for 10 days. 5 days with one manipulation 5 days with another
manipulation. I therefore have 6 replicate groups (n=6) each with one mean
measurement for manipulation A and manipulation B. Each group consists of a
set of males and females. An independent group of males for each group
replicate, however there are only 2 sets of females each replicated 3 times
within the six groups.

data takes the from:

group response female set treatment

1 3.0 A high
2 3.0 B high
3 2.8 A high
4 2.6B high
5 2.6 Ahigh
6 2.9B high
1 1.5A low
2 1.4 Blow
3 1.7 A low
4 1.9 B low
5 2.0 A low
62.1 B low

The order of treatment is counterbalanced and I would assume I would choose
to fit the model:

> model1<-lme(response~treat, random=~1|femaleset/group)

or
> model2<-lmer(response~treat+(1|femaleset/group))

However I am concerned with two aspcts: my small sample size of course but
also the use of a random effect of female set only has two levels (A and B).

Is there a more appropriate way to handle this analysis? A glm with female
group as an explanatory for instance such as:

model3<-glm(response~treatment+femaleset+treatment*femaleset). Although
yhis will not properly account for the pseudoreplication. Ant assistance or
help would be greatly appreciated.

Best
Colin

--
View this message in context:
http://r.789695.n4.nabble.com/lmer-with-2-random-effects-with-only-two-levels-tp3536791p3536791.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] svytable and na's

2011-05-19 Thread Thomas Lumley

On Thu, May 19, 2011 at 11:41 PM, Manderscheid Katharina
 wrote:
> hi,
>
> i am trying to work with the survey package in order to apply survey design 
> weights. the data set i am using - ess - contains missing values.
> my question: when using svytable(~variable1+variable2, 
> design=my.svydesign.object, na.rm=T) the resulting crosstable contains all 
> missings although i defined the na's as such.
>

Could you give more details? There isn't an na.rm= argument to
svytable, so it's not surprising it has no effect, but I don't know
what you mean when you say the table "contains all missings." Perhaps
you could show us the output and say how it differs from what you
expected.

 -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Style Guide -- Was Post-hoc tests in MASS using glm.nb

2011-05-19 Thread Kevin Wright

John Chambers is also very widely quoted describing the aim of S as "to turn
ideas into software, quickly and faithfully."  Perhaps S3 is for people
wanting to do things "quickly" and S4 is for those wanting to do things
"faithfully".

Kevin

On Wed, May 18, 2011 at 7:39 PM,  wrote:

> I used to think like that.  However I have recently re-read John Chambers'
> "Software for Data Analysis" and now I'm starting to see the point.
>
> S4 classes and methods do require you to plan your classes and methods well
> and the do impose a discipline that can seem rigid and unnecessary.  But I
> have found that to program well you do need to exerceise a lot of
> discipline, mainly because it can take quite some time to spot all the
> inadequacies and even traps in your code that an ill-disciplined approach
> lets you get away with at first.
>
> IMHO, of course.  Perhaps you can all see the traps that elude me.
>
> Cheers,
> Bill.
>
> PS Rolf Turner?  Respectful?  Goodness, what's going on?  :-)
>
> -Original Message-
> From: Rolf Turner [mailto:rolf.tur...@xtra.co.nz]
> Sent: Thursday, 19 May 2011 9:34 AM
> To: Venables, Bill (CMIS, Dutton Park)
> Cc: r-help@r-project.org
> Subject: Re: [R] R Style Guide -- Was Post-hoc tests in MASS using glm.nb
>
>
> On 19/05/11 10:26, bill.venab...@csiro.au wrote:
>
> 
> > Most of [the Google style guide's] advice is very good (meaning I agree
> with it!) but some is a bit too much (for example, the blanket advice never
> to use S4 classes and methods - that's just resisting progress, in my view).
> 
>
> I must respectfully disagree with this view, and concur heartily with
> the style guide.
> S4 classes and methods are a ball-and-chain that one has to drag along.
> See also
> fortune("S4 methods"). :-)
>
> cheers,
>
> Rolf
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Flattening lists and environments (was: "how to flatten a list to the same level?")

2011-05-19 Thread Janko Thyson

Dear list,

I came up with a two functions that flatten arbitrary deeply nested 
lists (as long as they're named; not tested for unnamed) and 
environments (see attachment; 'flatten_examples.txt' contains some 
examples).

The paradigm is somewhat similar to that implemented in 'unlist()', yet 
extends it. I would have very much liked to build upon the superfast 
functionality of 'unlist()', but there are some obstacles to this (see 
these two related posts at r-devel from today: 
https://stat.ethz.ch/pipermail/r-devel/2011-May/061070.html and 
https://stat.ethz.ch/pipermail/r-devel/2011-May/061071.html).

Therefore, I had to use a recursive looping paradigm. Yet, if anyone has 
some suggestions on how to speed things up (maybe some Rcpp-people feel 
"called upon"?!? ;-)), I'd appreciate any pointers. Yet I do hope that 
what I came up with is at least of some value for those that posted 
similar questions on how to flexibly flatten nested objects in the past 
(that's why I'm also referring to this older post below; I also build 
upon the code provided by Henrique Dallazuanna and Mark Heckmann).

Best regards,
Janko

PS: Maybe this should rather go into a blog-post, but I don't have one 
yet ;-)

On 19.05.2011 22:16, Janko Thyson wrote:
From: Mark Heckmann > 

Date: Sat, 09 Jan 2010 13:49:15 +0100

Henrique,

thanks for the code!! It works out fine for vectors. I forgot to 
mention I also have dataframes as list elements. Thus I want the 
structure of the list element to be kept intact.

I tried an recursive approach (which unfortunately resulted in some 
more code) which works.

.getNonListElements <- function(x, env){

if(class(x)=="list") {
for(i in seq(along=x)) .getNonListElements(x[[i]], env) # call
recursively
} else {
res<- get("res", envir = env)  # get res from other 
env
res<- c(res, list(x))# add one 
list element
assign("res", res, envir=env) # assign back to env
}

}

flattenList <- function(l){

res<- list() # make list object
env<- environment()  # get current env  

.getNonListElements(l, env) # search for non list elements 
recursively
return(res)

}

l <- list(DF=data.frame(A=c(1,2)), vec=c("a", "b")) l <- list(l,l)

> flattenList(l)

[[1]]

   A
1 1
2 2

[[2]]
[1] "a" "b"

[[3]]

   A
1 1
2 2

[[4]]
[1] "a" "b"

I am not sure if one can avoid the wrapper function or still use 
rapply to simplify the code. I do not know how. One more thing I would 
like to add are the objects names to the generated list. But I did not 
succeed in that.

Mark

Am 08.01.2010 um 18:29 schrieb Henrique Dallazuanna:

>  Try something about like this:
>
>  split(unlist(l), rep(1:length(idx<- rapply(l, length)), idx))
>
>  On Fri, Jan 8, 2010 at 1:35 PM, Mark Heckmann
>  wrote:
>>  I have a nested list l like:
>>
>>  l<- list(A=c(1,2,3), B=c("a", "b"))
>>  l<- list(l,l, list(l,l))
>>
>>  I want the list to be unlisted, but not on the lowest level of each
>>  "branch".
>>  I want the lowest level of each list branch to remain as it is.
>>  So unlist or unlist(rec=F) do not work here as the level of nesting
>>  may
>>  differ on the elements.
>>  The result should look like:
>>
>>  $A
>>  [1] 1 2 3
>>
>>  $B
>>  [1] "a" "b"
>>
>>  $A
>>  [1] 1 2 3
>>
>>  $B
>>  [1] "a" "b"
>>
>>  $A
>>  [1] 1 2 3
>>
>>  $B
>>  [1] "a" "b"
>>
>>  $A
>>  [1] 1 2 3
>>
>>  $B
>>  [1] "a" "b"
>>
>>  Any ideas?
>>  TIA!
>>
>>  Mark
>>
>>
>>  
--
>>  Mark Heckmann
>>  Dipl. Wirt.-Ing. cand. Psych.
>>  Vorstraße 93 B01
>>  28359 Bremen
>>  Blog:www.markheckmann.de
>>  R-Blog:http://ryouready.wordpress.com
>>
>>  __
>>  R-help_at_r-project.org mailing list
>>  https://stat.ethz.ch/mailman/listinfo/r-help
>>  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>>  and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>  -- 
>  Henrique Dallazuanna

>  Curitiba-Paraná-Brasil
>  25° 25' 40" S 49° 16' 22" O

--

Mark Heckmann
Dipl. Wirt.-Ing. cand. Psych.
Vorstraße 93 B01
28359 Bremen
Blog: www.markheckmann.de
R-Blog: http://ryouready.wordpress.com
--

*Janko Thyson*
janko.thy...@ku-eichstaett.de 

Catholic University of Eichstätt-Ingolstadt
Ingolstadt School of Management
Statistics and Quantitative Methods
Auf der Schanz 49
D-85049 Ingolstadt

www.wfi.edu/lsqm 

Fon: +49 841 937-1923
Fax: +49 841 9

Re: [R] Shrink file size of pdf graphics

2011-05-19 Thread Layman123

Thank you all for the quick answers!

I googled first instead of having the idea to search this forum...
I'm using R 2.12.2 on a 32-bit Computer with windows installed.

Up to this point I was trying to get the image the way I would like to have
it, but didn't get fully satisfactory results.
I tried both, the plot devices in R and pdftk. First I tried the png-device,
but as I wanted to increase the number of pixels with 'width' and 'height',
the labels are getting smaller. Yet attempting to prohibit this manually
with 'cex' doesn't give satifactory results. Searching the list I read about
the package 'cairoDevice' that would prevent this, increasing only the
resolution of the image and changing nothing about the size of the objects
in the plot. After I managed to install it, I got an error trying to open a
'Cairo'-device. I probably missed to install something. The CRAN-page
doesn't state anything on how to do this. Is there a way in R to produce
images with high resolution?

Using pdftk I managed to compress the file size from 13 MB to 3,6 MB. This
is really a good compression. However I would need an even smaller file
size. Unfortunately I saw your post very late, Aaron. With this command, a
pdf of size 3,4 MB was produced. pdftk doesn't seem to offer any options to
take influence on how much the pdf is compressed. 
Is there a way to do this with the gs-command so that it would be even more
compressed?

--
View this message in context: 
http://r.789695.n4.nabble.com/Shrink-file-size-of-pdf-graphics-tp3536042p3536850.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Anyone successfully install Rgraphviz on windows with R 2.13?

2011-05-19 Thread Michael Conklin

I have been trying to get Rgraphviz to work (I know it is from Bioconductor) 
unsuccessfully. Since I have no experience with Bioconductor I thought I would 
ask here if anyone has advice. I have installed Graphviz 2.20.3 as is 
recommended on the Bioconductor site but basically R cannot seem to find the 
needed dll files.  So, even though I have added the appropriate directories to 
the system path R cannot seem to find them. Any tips would be appreciated.

W. Michael Conklin
Chief Methodologist
Google Voice: (612) 56STATS

MarketTools, Inc. | www.markettools.com
6465 Wayzata Blvd | Suite 170 |  St. Louis Park, MN 55426.  PHONE: 952.417.4719 
| CELL: 612.201.8978
This email and attachment(s) may contain confidential and/or proprietary 
information and is intended only for the intended addressee(s) or its 
authorized agent(s). Any disclosure, printing, copying or use of such 
information is strictly prohibited. If this email and/or attachment(s) were 
received in error, please immediately notify the sender and delete all copies


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorial search for elements with certain attributes

2011-05-19 Thread William Dunlap

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Dennis Murphy
> Sent: Thursday, May 19, 2011 11:38 AM
> To: infoc...@gmx.net
> Cc: r-help@r-project.org
> Subject: Re: [R] vectorial search for elements with certain attributes
> 
> Hi:
> 
> Here's one possible approach:
> 
> A <- c(1,3,7,8,10)
> B <- c(5,8)
> apply(outer(A, B, '-'), 2, function(x) min(which(x >= 0)))
> [1] 3 4

Using findInterval() or as.integer(cut()) would use
quite a bit less memory and time for long vectors.
findInterval would be more direct, but I think it
insists on using ">" instead of ">=" in the above code.cut() lets you
choose.

  > as.integer(cut(c(5, 8, 8.01), A, right=TRUE))+1
  [1] 3 4 5
  > as.integer(cut(c(5, 8, 8.01), A, right=FALSE))+1
  [1] 3 5 5
  > findInterval(c(5, 8, 8.01), A) + 1
  [1] 3 5 5

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 
> 
> HTH,
> Dennis
> 
> On Thu, May 19, 2011 at 2:58 AM,   wrote:
> > I have 2 vectors A and B. For each element of B I'd like to 
> find the index of the next higher or equal element in A. And 
> I'd like to store it effectiv. E.g.:
> > A <- c(1,3,7,8,10)
> > B <- c(5,8)
> >
> > result: 3, 4
> >
> > I have a possibility but for long vectors it works not very 
> effectiv:
> >
> > ans <- sapply(B, function(x) which.max(A[A < x]) )
> > as.integer(ans + 1)
> >
> > Is there anyone how has a better idea?
> > Thank you for your help,
> > Thomas.
> > --
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Email out of R (code)

2011-05-19 Thread Bos, Roger

I took a look at sendmailR when I was trying to figure out how to send
email a few years ago.  I ended up using my SQL Server dbmail facility,
which I can easily access from R using the RODBC package.  In case my
scenario applies to anyone else, I will paste the simple function I
wrote to serve as an example to anyone else who may want to do the same
thing:


"email" <- function(to, subject, msg, texttype='TEXT', imp='NORMAL',
profileName='Roger Bos') {
go <- try(sqlQuery(xf, "EXECUTE msdb..sp_send_dbmail  
 @profile_name  = '" %+% profileName %+% "',
  @recipients= '" %+% to %+% "',
 @body  = '" %+% msg %+% "',
 @subject   = '" %+% subject %+% "',
 @query = '',
  @body_format   = " %+% texttype %+% ",
  @importance= '" %+% imp %+% "',
  @attach_query_result_as_file = 0"))
}
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Daniel Malter
Sent: Thursday, May 19, 2011 2:04 PM
To: r-help@r-project.org
Subject: Re: [R] Email out of R (code)

As I (thought I) understood from the sendmailR manual, the package does
currently not support server authentication, or does it?

Daniel 

--
View this message in context:
http://r.789695.n4.nabble.com/Email-out-of-R-code-tp3530671p3536512.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
***

This message is for the named person's use only. It may\...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] trouble with summary tables with several variables using aggregate function

Oops, didn't see Marc's reply. His solution is much more compact. For
R 2.11.0 and above, aggregate() now has a formula interface that
usually works nicely:

aggregate(Var3 ~ Var1 + Var2, data = d, FUN = table)
  Var1 Var2 Var3.D Var3.I
1   S1   T1  2  2
2   S2   T1  2  2
3   S1   T2  2  2
4   S2   T2  0  4

Dennis

On Thu, May 19, 2011 at 12:29 PM, Dennis Murphy  wrote:
> Hi:
>
> The dummy column really isn't necessary. Here's another way to get the
> result you want. Let d be the name of your example data frame.
>
> d <- d[, 1:3]
> (dtable <- as.data.frame(ftable(d, row.vars = c(1, 2
>  Var1 Var2 Var3 Freq
> 1   S1   T1    D    2
> 2   S2   T1    D    2
> 3   S1   T2    D    2
> 4   S2   T2    D    0
> 5   S1   T1    I    2
> 6   S2   T1    I    2
> 7   S1   T2    I    2
> 8   S2   T2    I    4
>
> An alternative to the reshape() function is the reshape2 package,
> which has a function dcast() that allows you to rearrange the data
> frame as you desire.
>
> library(reshape2)
> dcast(dtable, Var1 + Var2 ~ Var3)
> Using Freq as value column: use value_var to override.
>  Var1 Var2 D I
> 1   S1   T1 2 2
> 2   S1   T2 2 2
> 3   S2   T1 2 2
> 4   S2   T2 0 4
>
>
> HTH,
> Dennis
>
> On Thu, May 19, 2011 at 2:13 AM, Luma R  wrote:
>> Dear all,
>>
>> I am having trouble creating summary tables using aggregate function.
>>
>> given the following table:
>>
>>
>> Var1   Var2    Var3   dummy
>> S1       T1         I         1
>> S1       T1         I         1
>> S1       T1         D        1
>> S1       T1         D        1
>> S1       T2         I         1
>> S1       T2         I         1
>> S1       T2         D        1
>> S1       T2         D        1
>> S2       T1         I         1
>> S2       T1         I         1
>> S2       T1         D        1
>> S2       T1         D        1
>> S2       T2         I         1
>> S2       T2         I         1
>> S2       T2         I        1
>> S2       T2         I        1
>>
>>
>> I want to create a summary table that shows for each category of Var1,
>> Var2, the number of cells that are Var3=D and Var3-I :
>>
>>         Var1 Var2  Var3(D)   Var3(I)
>>         S1     T1    2              2
>>         S1     T2    2              2
>>         S2     T1    2              2
>>         S2     T2    0              4
>>
>>
>>
>> However, if I do: Count.Cells=  aggregate(dummy~ Var1+Var2+Var3, FUN='sum')
>> , I get:
>>
>>           Var1 Var2  Var3 Count of Resp
>>            S1     T1     D        2
>>            S1     T1     I          2
>>            S1     T2     D        2
>>            S1     T2     I         2
>>            S2     T1     D       2
>>            S2      T1    I        2
>>            S2     T2     I        4
>>
>>
>> Is there a way to get different columns for each Var3 level?
>>
>>
>> Thank you for any help you can give!
>>
>>        [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] trouble with summary tables with several variables using aggregate function

Hi:

The dummy column really isn't necessary. Here's another way to get the
result you want. Let d be the name of your example data frame.

d <- d[, 1:3]
(dtable <- as.data.frame(ftable(d, row.vars = c(1, 2
  Var1 Var2 Var3 Freq
1   S1   T1D2
2   S2   T1D2
3   S1   T2D2
4   S2   T2D0
5   S1   T1I2
6   S2   T1I2
7   S1   T2I2
8   S2   T2I4

An alternative to the reshape() function is the reshape2 package,
which has a function dcast() that allows you to rearrange the data
frame as you desire.

library(reshape2)
dcast(dtable, Var1 + Var2 ~ Var3)
Using Freq as value column: use value_var to override.
  Var1 Var2 D I
1   S1   T1 2 2
2   S1   T2 2 2
3   S2   T1 2 2
4   S2   T2 0 4


HTH,
Dennis

On Thu, May 19, 2011 at 2:13 AM, Luma R  wrote:
> Dear all,
>
> I am having trouble creating summary tables using aggregate function.
>
> given the following table:
>
>
> Var1   Var2    Var3   dummy
> S1       T1         I         1
> S1       T1         I         1
> S1       T1         D        1
> S1       T1         D        1
> S1       T2         I         1
> S1       T2         I         1
> S1       T2         D        1
> S1       T2         D        1
> S2       T1         I         1
> S2       T1         I         1
> S2       T1         D        1
> S2       T1         D        1
> S2       T2         I         1
> S2       T2         I         1
> S2       T2         I        1
> S2       T2         I        1
>
>
> I want to create a summary table that shows for each category of Var1,
> Var2, the number of cells that are Var3=D and Var3-I :
>
>         Var1 Var2  Var3(D)   Var3(I)
>         S1     T1    2              2
>         S1     T2    2              2
>         S2     T1    2              2
>         S2     T2    0              4
>
>
>
> However, if I do: Count.Cells=  aggregate(dummy~ Var1+Var2+Var3, FUN='sum')
> , I get:
>
>           Var1 Var2  Var3 Count of Resp
>            S1     T1     D        2
>            S1     T1     I          2
>            S1     T2     D        2
>            S1     T2     I         2
>            S2     T1     D       2
>            S2      T1    I        2
>            S2     T2     I        4
>
>
> Is there a way to get different columns for each Var3 level?
>
>
> Thank you for any help you can give!
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] mediation

2011-05-19 Thread Mervi Virtanen




Hi!

I try to find out how bootstrap mediation work! How I coud see the  
whole function or algorithm? (rather whole bootstrap).



t.Mete

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with Princurve



On May 19, 2011, at 9:43 AM, guy33 wrote:


Hey all,

I can't seem to get the princurve package to produce correct  
results, even

in the simplest cases.  For example, if you just generate a 1 period
noiseless sine wave, and ask for the principal curve and plot, the  
returned
curve is clearly wrong (doesn't follow the sine wave).  Here's my  
code:


library(princurve)
x <- runif(1000,0,2*pi); x <- cbind(x/(2*pi), sin(x))
fit1 <- principal.curve(x, plot = TRUE)

Anyone have any suggestions?


The rule for contributed packages which you feel can be demonstrated  
to  give incorrect results is to contact the maintainer of the  
package. In this case I wonder if you even know what you are doing,  
however. Reading the help page for that function does not suggest to  
me that you should be expecting it to "go through the points"


"Fits a principal curve which describes a smooth curve that passes  
through the middle of the data x in an orthogonal sense."


The plotted curve does seem to adhere to that description with an  
emphasis on the last two words.



 If you run this code, do you get the correct
principal curve?


For the meaning assigned to "correct" by the authors, then yes.



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] a tool for indirect solving of systems of equations

2011-05-19 Thread Carl Witthoft

I wrote this goody a while back, and thought I'd post it in case anyone 
can find a use for it.  I'm mentioning it here because I'm not convinced 
it warrants a full package release.


The function is named "ktsolve,"  and is available at
http://witthoft.com/rtools.html  or directly at 
http://witthoft.com/ktsolve.R


It's makes use of the excellent BBsolve engine; I named it in honor of 
one of my favorite tools  TK!Solver.


Carl

PS feel free to send me bug/error reports, screams about the limited 
documentation, etc.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorial search for elements with certain attributes

Hi:

Here's one possible approach:

A <- c(1,3,7,8,10)
B <- c(5,8)
apply(outer(A, B, '-'), 2, function(x) min(which(x >= 0)))
[1] 3 4

HTH,
Dennis

On Thu, May 19, 2011 at 2:58 AM,   wrote:
> I have 2 vectors A and B. For each element of B I'd like to find the index of 
> the next higher or equal element in A. And I'd like to store it effectiv. 
> E.g.:
> A <- c(1,3,7,8,10)
> B <- c(5,8)
>
> result: 3, 4
>
> I have a possibility but for long vectors it works not very effectiv:
>
> ans <- sapply(B, function(x) which.max(A[A < x]) )
> as.integer(ans + 1)
>
> Is there anyone how has a better idea?
> Thank you for your help,
> Thomas.
> --
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] trouble with summary tables with several variables using aggregate function

2011-05-19 Thread Marc Schwartz

Another approach, using aggregate(), presuming that the data is in a data frame 
called 'DF':

> with(DF, aggregate(Var3, list(Var1 = Var1, Var2 = Var2), table))
  Var1 Var2 x.D x.I
1   S1   T1   2   2
2   S2   T1   2   2
3   S1   T2   2   2
4   S2   T2   0   4


HTH,

Marc Schwartz


On May 19, 2011, at 1:10 PM, Phil Spector wrote:

> Luma -
>   If I understand you correctly, I think the easiest way
> to get what you want is to use the reshape function on
> the output from aggregate:
> 
>> reshape(Count.Cells,idvar=c('Var1','Var2'),timevar='Var3',direction='wide')
>  Var1 Var2 dummy.D dummy.I
> 1   S1   T1   2   2
> 2   S2   T1   2   2
> 3   S1   T2   2   2
> 7   S2   T2  NA   4
> 
>   - Phil Spector
>Statistical Computing Facility
>Department of Statistics
>UC Berkeley
>spec...@stat.berkeley.edu
> 
> 
> 
> On Thu, 19 May 2011, Luma R wrote:
> 
>> Dear all,
>> 
>> I am having trouble creating summary tables using aggregate function.
>> 
>> given the following table:
>> 
>> 
>> Var1   Var2Var3   dummy
>> S1   T1 I 1
>> S1   T1 I 1
>> S1   T1 D1
>> S1   T1 D1
>> S1   T2 I 1
>> S1   T2 I 1
>> S1   T2 D1
>> S1   T2 D1
>> S2   T1 I 1
>> S2   T1 I 1
>> S2   T1 D1
>> S2   T1 D1
>> S2   T2 I 1
>> S2   T2 I 1
>> S2   T2 I1
>> S2   T2 I1
>> 
>> 
>> I want to create a summary table that shows for each category of Var1,
>> Var2, the number of cells that are Var3=D and Var3-I :
>> 
>>Var1 Var2  Var3(D)   Var3(I)
>>S1 T12  2
>>S1 T22  2
>>S2 T12  2
>>S2 T20  4
>> 
>> 
>> 
>> However, if I do: Count.Cells=  aggregate(dummy~ Var1+Var2+Var3, FUN='sum')
>> , I get:
>> 
>>  Var1 Var2  Var3 Count of Resp
>>   S1 T1 D2
>>   S1 T1 I  2
>>   S1 T2 D2
>>   S1 T2 I 2
>>   S2 T1 D   2
>>   S2  T1I2
>>   S2 T2 I4
>> 
>> 
>> Is there a way to get different columns for each Var3 level?
>> 
>> 
>> Thank you for any help you can give!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Separating boot results

Hi Patrick,

How about this (untested)?

a <- codboot[c(4)]
round(a$bca[4, 5], 2)

HTH,
Jorge


On Thu, May 19, 2011 at 7:20 AM, Patrick Santoso <> wrote:

> Good Morning,
>
> I'm having what I hope to be a simple problem. I am generating bootstrap
> confidence intervals using package (boot) - which works perfectly. The
> issue
> I am having is getting the results into a format which I can write out to a
> database. To be clear I am having no problems generating the results, I
> just
> need to convert the format of the results such that I can store the results
> in a dataframe to save out to a database.
>
> I am doing the following:
>
> ## Generate 20,000 bootstrap samples
> cod.boot <- boot(trimmed$ratio, cod, R=2)
>
> ## generate 90% BCA boostrap confidence intervals
> codboot <- boot.ci(cod.boot,conf = c(0.90), type= c("bca"))
>
> At this point I have stored the answer I want to the variable codboot, but
> the problem is it is in the format as follows:
>
> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
> Based on 2 bootstrap replicates
>
> CALL :
> boot.ci(boot.out = cod.boot, conf = c(0.9), type = c("bca"))
>
> Intervals :
> Level   BCa
> 90%   ( 6.10, 10.23 )
> Calculations and Intervals on Original Scale
>
> What I would like is the 6.10, and 10.23 each stored in their own variables
> so I can combine them into my existing dataframe (using cbind). The best
> I've been able to do so far is:
>
> a <- codboot[c(4)]
>
> which gives me a:
>
> $bca
> conf
> [1,]  0.9 2343.41 19683.87 6.099788 10.23007
>
> which is closer, but I cannot parse this variable enough to get the 6.10
> and
> 10.23 out since apparently strsplit doesn't allow splitting on spaces.
>
> Any help would be greatly appreciated!
>
> Best Regards & Thank You in Advance
>
> Patrick
>
>
>
>
>
> --
>
> Patrick Santoso
>
> University of New Hampshire
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] trouble with summary tables with several variables using aggregate function

2011-05-19 Thread Phil Spector


Luma -
   If I understand you correctly, I think the easiest way
to get what you want is to use the reshape function on
the output from aggregate:


reshape(Count.Cells,idvar=c('Var1','Var2'),timevar='Var3',direction='wide')

  Var1 Var2 dummy.D dummy.I
1   S1   T1   2   2
2   S2   T1   2   2
3   S1   T2   2   2
7   S2   T2  NA   4

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu



On Thu, 19 May 2011, Luma R wrote:


Dear all,

I am having trouble creating summary tables using aggregate function.

given the following table:


Var1   Var2Var3   dummy
S1   T1 I 1
S1   T1 I 1
S1   T1 D1
S1   T1 D1
S1   T2 I 1
S1   T2 I 1
S1   T2 D1
S1   T2 D1
S2   T1 I 1
S2   T1 I 1
S2   T1 D1
S2   T1 D1
S2   T2 I 1
S2   T2 I 1
S2   T2 I1
S2   T2 I1


I want to create a summary table that shows for each category of Var1,
Var2, the number of cells that are Var3=D and Var3-I :

Var1 Var2  Var3(D)   Var3(I)
S1 T12  2
S1 T22  2
S2 T12  2
S2 T20  4



However, if I do: Count.Cells=  aggregate(dummy~ Var1+Var2+Var3, FUN='sum')
, I get:

  Var1 Var2  Var3 Count of Resp
   S1 T1 D2
   S1 T1 I  2
   S1 T2 D2
   S1 T2 I 2
   S2 T1 D   2
   S2  T1I2
   S2 T2 I4


Is there a way to get different columns for each Var3 level?


Thank you for any help you can give!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Email out of R (code)

2011-05-19 Thread Daniel Malter

As I (thought I) understood from the sendmailR manual, the package does
currently not support server authentication, or does it?

Daniel 

--
View this message in context: 
http://r.789695.n4.nabble.com/Email-out-of-R-code-tp3530671p3536512.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Border-Layout by multiple Plots

2011-05-19 Thread Stuber Thomas TA.I_BB_BS.0701

Hello R-Community

I did a lot of research, but i found no answer for my issue.

I try to plot multiple Plots. I do this with the layout() function.
The following two pictures show the defined layout:
http://www.stuber.info/layout_1.jpg
http://www.stuber.info/layout_2.jpg

The final plots looks like:
http://www.stuber.info/plot_1.jpg
http://www.stuber.info/plot_2.jpg


So far is all right. Now my problem is, that I'm not able to make the borders 
as needed. See the layout_1.jpg and layout_2.jpg again. The colored lines 
represents the needed borders.
I tried to draw lines into every plot to bulit a global border. But it's a hack 
job and it's "ugly". I also tried to make the layout with the par() or the 
split.screen() functions, but without success. To interleave layouts is also 
not possible in R (or does anybody know a way to do it?)

Does anyone can help me with this issue?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] FW: problem with optim()

2011-05-19 Thread chirine wolley



 


From: wolley.chir...@hotmail.com
To: biomathjda...@gmail.com
Subject: RE: [R] problem with optim()
Date: Thu, 19 May 2011 15:45:15 +0200




First, thank you for ur response...
Actually I didn't write the entire code ...X, Y and p_tilde are of course all 
defined at the beginning of my code.
Normally the optimal values that I get should let me create a model for 
prediction. However the model currently obtained is not good at all
And I think that the fact  it gives me the exactly same values if I change the 
function to maximize shows that there's a problem in my code
 
> Date: Thu, 19 May 2011 09:29:02 -0400
> Subject: Re: [R] problem with optim()
> From: biomathjda...@gmail.com
> To: wolley.chir...@hotmail.com
> CC: r-help@r-project.org
> 
> What do you mean when you say "wrong results"? What do you expect for
> the output? Your code doesn't work for me because it references X in
> places and X is not defined.
> 
> Have you tested your functions to make sure they return reasonable values?
> 
> On Thu, May 19, 2011 at 9:17 AM, chirine wolley
>  wrote:
> >
> > Dear R-users,
> >
> > I would like to maximize the function g above which depends on 4 parameters 
> > (2 vectors, 1 real number, and 1 matrix)  using optim() and BFGS method. 
> > Here is my code:
> >
> > # fonction to maximize
> >
> > g=function(x)
> > {
> > x1 = x[1:ncol(X)]
> > x2 = x[(ncol(X)+1)]
> > x3 = 
> > matrix(x[(ncol(X)+2):(ncol(X)+1+ncol(X)*ncol(Y))],nrow=ncol(X),ncol=ncol(Y))
> > x4 = x[(ncol(X)+1+ncol(X)*ncol(Y)+1):length(x)]
> > res1=rep(0,nrow(X))
> > res2=matrix(0,nrow=nrow(X),ncol=ncol(Y))
> > log.res2=matrix(0,nrow=nrow(X),ncol=ncol(Y))
> > res2.b=rep(0,nrow(X))
> > res3 = rep(0,nrow(X))
> > res3.b = rep(0,nrow(X))
> > for (i in 1:nrow(X))
> > {
> > res1[i]=1/(1+exp(-t(x1)%*%X[i,]-x2))
> > for (t in 1:ncol(Y))
> > {
> > res2[i,t] = 
> > ((1-(1+exp(-t(x3[,t])%*%X[i,]-x4[t]))^(-1))^(abs(Y[i,t]-Yb[i])))*(((1+exp(-t(x3[,t])%*%X[i,]-x4[t]))^(-1))^(1-abs(Y[i,t]-Yb[i])))
> > log.res2[i,t]=log(res2[i,t])
> > res2.b[i]=res2.b[i]+log.res2[i,t]
> > }
> > res3[i] = p_tilde[i]*log(res1[i])
> > res3.b[i] = p_tilde[i]*(res2.b[i])
> > }
> > -(ncol(Y)*sum(res3)+sum(res3.b))
> >
> > }
> >
> > # Gradiants:
> >
> > gr=function(x)
> > {
> > x1 = x[1:ncol(X)]
> > x2 = x[(ncol(X)+1)]
> > x3 = 
> > matrix(x[(ncol(X)+2):(ncol(X)+1+ncol(X)*ncol(Y))],nrow=ncol(X),ncol=ncol(Y))
> > x4 = x[(ncol(X)+1+ncol(X)*ncol(Y)+1):length(x)]
> > gr1 = rep(0,ncol(X))
> > gr4 = rep(0,ncol(Y))
> > gr3 = matrix(0,nrow=ncol(X),ncol=ncol(Y))
> > gr1.b = matrix(0,nrow=nrow(X),ncol=ncol(X))
> > gr2.b = rep(0,nrow(X))
> > eta = matrix(0,nrow=nrow(X),ncol=ncol(Y))
> > d.eta.3 = array(0,dim=c(nrow(X),ncol(X),ncol(Y)))
> > d.eta.4 = matrix(0,nrow=nrow(X),ncol=ncol(Y))
> > gr3.b1 = array(0,dim=c(nrow(X),ncol(X),ncol(Y)))
> > gr4.b1 = matrix(0,nrow=nrow(X),ncol=ncol(Y))
> >
> > #Gradiant of alpha and beta
> >
> > for (i in 1:nrow(X))
> > {
> > gr1.b[i,] = 
> > (2*p_tilde[i]-1)*((exp(-t(x1)%*%X[i,]-x2)*X[i,])/(1+exp(-t(x1)%*%X[i,]-x2))^2)
> > gr2.b[i] = 
> > (2*p_tilde[i]-1)*((exp(-t(x1)%*%X[i,]-x2))/(1+exp(-t(x1)%*%X[i,]-x2))^2)
> > }
> > for (j in 1:ncol(X))
> > {
> > gr1[j] = sum(gr1.b[,j])
> > }
> > gr2 = sum(gr2.b)
> >
> >
> > #Gradiant de w et gamma
> > for (i in 1:nrow(X))
> > {
> > for (t in 1:ncol(Y))
> > {
> > eta[i,t] = 1/(1+exp(-t(x3[,t])%*%X[i,]-x4[t]))
> > d.eta.3[i,,t] = eta[i,t]*(1-eta[i,t])*X[i,]
> > d.eta.4[i,t] = eta[i,t]*(1-eta[i,t])
> > gr3.b1[i,,t] = 
> > p_tilde[i]*((-abs(Y[i,t]-Yb[i]))*(1-eta[i,t])^(-1)+(1-abs(Y[i,t]-Yb[i]))*
> > (eta[i,t])^(-1))*d.eta.3[i,,t]
> > gr4.b1[i,t] = 
> > p_tilde[i]*((-abs(Y[i,t]-Yb[i]))*(1-eta[i,t])^(-1)+(1-abs(Y[i,t]-Yb[i]))*
> > (eta[i,t])^(-1))*d.eta.4[i,t]
> > }
> > }
> > for (t in 1:ncol(Y))
> > {
> > for (j in 1:ncol(X))
> > {
> > gr3[j,t] = sum(gr3.b1[,j,t])
> > }
> > gr4[t] = sum(gr4.b1[,t])
> > }
> > c(-gr1,-gr2,-gr3,-gr4)
> > }
> >
> > opt = optim(c(alpha[,c+1],beta[c+1],w,gamma),g,gr,method="BFGS")
> >
> > The problem is that it gives me wrong results, and I have noticed that if I 
> > change my function to maximize (for example if, instead of 
> > -(ncol(Y)*sum(res3)+sum(res3.b)), I try to maximise -(ncol(Y)*sum(res3)), 
> > it gives me the exactly same results...which is not possible!
> > So maybe I am using optim() in a wrong way...Does someone have an idea what 
> > could be wrong in my code ?
> >
> > Thank you very much in advance
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> 
> -- 
> ===
> Jon Daily
> Technician
> ===
> #!/usr/bin/env outside
> # It's great, trust me.

[R] Pie chart

2011-05-19 Thread Silvano

I made a pie chart and the names of the levels are outside 
the circle. How do I put the names of the levels within each 
sector?


names(tab13) = paste(c('Regular', 'Bom', 'Excelente'), 
round(100*prop.table(tab13), dig=1), "%")
pie(tab13, col=c("LightYellow", "lightgreen", 'lightblue', 
'white'), clockwise=F, radius=.7)


Thanks,

--
Silvano Cesar da Costa
Departamento de Estatística
Universidade Estadual de Londrina
Fone: 3371-4346

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] trouble with summary tables with several variables using aggregate function

2011-05-19 Thread Luma R

Dear all,

I am having trouble creating summary tables using aggregate function.

given the following table:


Var1   Var2Var3   dummy
S1   T1 I 1
S1   T1 I 1
S1   T1 D1
S1   T1 D1
S1   T2 I 1
S1   T2 I 1
S1   T2 D1
S1   T2 D1
S2   T1 I 1
S2   T1 I 1
S2   T1 D1
S2   T1 D1
S2   T2 I 1
S2   T2 I 1
S2   T2 I1
S2   T2 I1


I want to create a summary table that shows for each category of Var1,
Var2, the number of cells that are Var3=D and Var3-I :

 Var1 Var2  Var3(D)   Var3(I)
 S1 T12  2
 S1 T22  2
 S2 T12  2
 S2 T20  4



However, if I do: Count.Cells=  aggregate(dummy~ Var1+Var2+Var3, FUN='sum')
, I get:

   Var1 Var2  Var3 Count of Resp
S1 T1 D2
S1 T1 I  2
S1 T2 D2
S1 T2 I 2
S2 T1 D   2
S2  T1I2
S2 T2 I4


Is there a way to get different columns for each Var3 level?


Thank you for any help you can give!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Heteroskedastic regression model

2011-05-19 Thread Tomas Klepsys

Dear all,

Can some please help me to estimate heteroskedastic regression model in R. I
know that in Stata reghv function (regh package) can be used. Waht are their
equivalents in R?

Thanks in advance.

Best,
Tomas

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Separating boot results

2011-05-19 Thread Patrick Santoso

Good Morning,

I'm having what I hope to be a simple problem. I am generating bootstrap
confidence intervals using package (boot) - which works perfectly. The issue
I am having is getting the results into a format which I can write out to a
database. To be clear I am having no problems generating the results, I just
need to convert the format of the results such that I can store the results
in a dataframe to save out to a database.

I am doing the following:

## Generate 20,000 bootstrap samples
cod.boot <- boot(trimmed$ratio, cod, R=2)

## generate 90% BCA boostrap confidence intervals
codboot <- boot.ci(cod.boot,conf = c(0.90), type= c("bca"))

At this point I have stored the answer I want to the variable codboot, but
the problem is it is in the format as follows:

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 2 bootstrap replicates

CALL :
boot.ci(boot.out = cod.boot, conf = c(0.9), type = c("bca"))

Intervals :
Level   BCa
90%   ( 6.10, 10.23 )
Calculations and Intervals on Original Scale

What I would like is the 6.10, and 10.23 each stored in their own variables
so I can combine them into my existing dataframe (using cbind). The best
I've been able to do so far is:

a <- codboot[c(4)]

which gives me a:

$bca
 conf
[1,]  0.9 2343.41 19683.87 6.099788 10.23007

which is closer, but I cannot parse this variable enough to get the 6.10 and
10.23 out since apparently strsplit doesn't allow splitting on spaces.

Any help would be greatly appreciated!

Best Regards & Thank You in Advance

Patrick





-- 

Patrick Santoso

University of New Hampshire

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help, please

2011-05-19 Thread Mike Marchywka









> From: dwinsem...@comcast.net
> To: julio.flo...@spss.com.mx
> Date: Thu, 19 May 2011 10:40:08 -0400
> CC: r-help@r-project.org
> Subject: Re: [R] Help, please
>
>
> On May 18, 2011, at 6:29 PM, Julio César Flores Castro wrote:
>
> > Hi,
> >
> > I am using R 2.10.1 and I have a doubt. Do you know how many cases
> > can R
> > handle?
>
> I was able to handle (meaning do Cox proportional hazards work with
> the 'rms' package which adds extra memory overhead with a datadist
> object) a 5.5 million rows by 100 columns dataframe without
> difficulty using 24 GB on a Mac (BSD UNIX kernel). I was running into
> performance slow downs related to paging out to virtual memory at 150
> columns, but after expanding to 32 GB can now handle 5.5 MM records
> with 200 columns without paging.
>
> >
> > I want to use the library npmc but if I have more than 4,500 cases I
> > get an
> > error message. If I use less than 4500 cases I don´t have problems
> > with this
> > library.
> >
> > Is there any way to increase the number of cases in order to use this
> > library.
>
> 64 bit OS, 64 bit R, and more memory.
>
The longer term solution is implementation and algorithm designed to
increase coherence
of memory accesses ( firefox is doing this to me now dropping every few chars 
and
 getting 
many behind as it thrashes with memory leak, LOL).




  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] vectorial search for elements with certain attributes

2011-05-19 Thread infochat

I have 2 vectors A and B. For each element of B I'd like to find the index of 
the next higher or equal element in A. And I'd like to store it effectiv. E.g.:
A <- c(1,3,7,8,10)
B <- c(5,8)

result: 3, 4

I have a possibility but for long vectors it works not very effectiv:

ans <- sapply(B, function(x) which.max(A[A < x]) )
as.integer(ans + 1)

Is there anyone how has a better idea?
Thank you for your help,
Thomas.
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with Princurve

2011-05-19 Thread guy33

Hey all,

I can't seem to get the princurve package to produce correct results, even
in the simplest cases.  For example, if you just generate a 1 period
noiseless sine wave, and ask for the principal curve and plot, the returned
curve is clearly wrong (doesn't follow the sine wave).  Here's my code:

library(princurve)
x <- runif(1000,0,2*pi); x <- cbind(x/(2*pi), sin(x))
fit1 <- principal.curve(x, plot = TRUE)

Anyone have any suggestions?  If you run this code, do you get the correct
principal curve?

Any help would be really appreciated!
-guy33 

--
View this message in context: 
http://r.789695.n4.nabble.com/Problem-with-Princurve-tp3535721p3535721.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] set x11 as default plot

2011-05-19 Thread cande

Hi,

have the same question. did you manage to make x11 your default ploting
device?

 - Chriss

--
View this message in context: 
http://r.789695.n4.nabble.com/set-x11-as-default-plot-tp2248427p3535824.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] read.csv and FileEncoding in Windows version of R 2.13.0

2011-05-19 Thread Alexander Peterhansl

Dear Help List:

read.csv() seems to have changed in R version 2.13.0 as compared to version 
2.12.2 when reading in simple CSV files.

In reading a 2-column CSV file ("test.csv"), say
1, a
2, b

If file is encoded as UTF-8 (on Windows 7), then under R 2.13.0
read.csv("test.csv",fileEncoding="UTF-8",header=FALSE) yields the following 
output
  V1
1  ?
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  invalid input found on input connection 'test.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'test.csv'

Under R 2.12.2 it runs problem-free and yields the expected:
  V1 V2
1  1  a
2  2  b

Please help.

Regards,
Alex

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Shrink file size of pdf graphics

2011-05-19 Thread Aaron Mackey

You can try something like this, at the command line:

  gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 -dPDFSETTINGS=/screen
-dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

evidently, the new compactPDF() function in R 2.13 does something very similar.

-Aaron

On Thu, May 19, 2011 at 11:30 AM, Duncan Murdoch
 wrote:
>
> On 19/05/2011 11:14 AM, Layman123 wrote:
>>
>> Hi everyone,
>>
>> My data consists of a system of nearly 75000 roads, available as a
>> shapefile. When I plot the road system, by adding the individual roads with
>> 'lines' and store it as a pdf-file with 'pdf' I get a file of size 13 MB.
>> This is way too large to add it in my LaTeX-document, because there will be
>> some more graphics of this type.
>> Now I'm curious to learn wheter there is a possibility in R to shrink the
>> file size of this graphic? I merely need it in a resolution so that it looks
>> "smooth" when printed out. I don't know much about the storage of R
>> graphics, but maybe there is a way to change the way the file is stored
>> perhaps as a pixel image?
>
>
> There are several possibilities.  You can use a bitmapped device (e.g. png()) 
> to save the image; pdflatex can include those.
>
> You can compress the .pdf file using an external tool like pdftk (or do it 
> internally in R 2.14.x, coming soon).
>
> There are probably others...
>
> Duncan Murdoch
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dbetagen function in mc2d package

2011-05-19 Thread Ben Bolker

krusty the klown  libero.it> writes:

> I found this useful package for generalized beta, yet the function that
> calculates its density leaves me puzzled, especially when I plotted it:
> 
> plot(function(y) dbetagen(y,4,1.2,min=0,max=40),xlim=c(0,40))
> 
> The area between the density curve and the x axis does not seem to measure
> 1!!!

  It would help to tell us that this is in the "mc2d" package ...

> integrate(dbetagen,shape1=4,shape2=1.2,min=0,max=40,0,40)
40 with absolute error < 2e-06

  I would suggest that you report this as a bug to the
package maintainers [maintainer("mc2d")].  Seems fairly easy to 
fix for yourself, though -- just scale by the size of the domain (max-min).

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix help (first occurrence of variable in column)

2011-05-19 Thread jim holtman

Is this what you are looking for:

> mdat3
   sp.1 sp.2 sp.3 sp.4 sp.5
T110010
T210010
T311100
T410111
>
> # create a matrix of when species first appeared
> first <- apply(mdat3, 2, function(x) (cumsum(x == 1) > 0) + 0L)
> # use first row as the number of starting species
> start <- sum(first[1,])
> # add column of new species; need diff to see growth
> mdat3 <- cbind(mdat3, new = c(0, diff(rowSums(first) - start)))
>
> mdat3
   sp.1 sp.2 sp.3 sp.4 sp.5 new
T110010   0
T210010   0
T311100   2
T410111   1
>
>



On Thu, May 19, 2011 at 9:46 AM, Michael Denslow
 wrote:
> On Wed, May 18, 2011 at 9:49 PM, jim holtman  wrote:
>> Is this what you were after:
>>
>>> mdat <- matrix(c(1,0,1,1,1,0), nrow = 2, ncol=3, byrow=TRUE,
>> +               dimnames = list(c("T1", "T2"),
>> +                               c("sp.1", "sp.2", "sp.3")))
>>>
>>> mdat
>>   sp.1 sp.2 sp.3
>> T1    1    0    1
>> T2    1    1    0
>>> # do 'rle' on each column and see if it is length >1 and starts with zero
>>> mdat.df <- as.data.frame(mdat)
>>> new.spec <- sapply(mdat.df, function(x){
>> +     x.rle <- rle(x)
>> +     (length(x.rle$values) > 1) & (x.rle$values[1L] == 0)
>> + })
>>> names(mdat.df)[new.spec]
>> [1] "sp.2"
>>>
>
> Thanks for your reply!
> This is close to what I want, but I think it only works if there is
> two rows. My actual data could have up to 8 rows (time samples).
>
> An example with 4 rows:
>
> mdat3 <- matrix(c(1,0,0,1,0,1,0,0,1,0,1,1,1,0,0,1,0,1,1,1), nrow = 4,
> ncol=5, byrow=TRUE,
>               dimnames = list(c("T1", "T2",'T3','T4'),
>                               c("sp.1", "sp.2", "sp.3","sp.4","sp.5")))
>
> mdat3
>
> mdat.df <- as.data.frame(mdat3)
> new.spec <- sapply(mdat.df, function(x){
>    x.rle <- rle(x)
>    (length(x.rle$values) > 1) & (x.rle$values[1L] == 0)
>        })
>
> names(mdat.df)[new.spec]
>
> It should say sp.5 since all the other species have occurred in other
> samples. Any further help would be much appreciated.
>
>
>>
>> On Wed, May 18, 2011 at 9:37 AM, Michael Denslow
>>  wrote:
>>> Dear R help,
>>> Apologies for the less than informative subject line. I will do my
>>> best to describe my problem.
>>>
>>> Consider the following matrix:
>>>
>>> mdat <- matrix(c(1,0,1,1,1,0), nrow = 2, ncol=3, byrow=TRUE,
>>>               dimnames = list(c("T1", "T2"),
>>>                               c("sp.1", "sp.2", "sp.3")))
>>>
>>> mdat
>>>
>>> In my actual data I have time (rows) and species occurrences (0/1
>>> values, columns). I want to count the number of new species that occur
>>> at a given time sample. For the matrix above the answer would be 1.
>>>
>>> Is there a simple way to figure out if the species has never occurred
>>> before and then sum them up?
>>>
>>> Thanks in advance,
>>> Micheal
>>>
>>> --
>>> Michael Denslow
>>>
>>> I.W. Carpenter Jr. Herbarium [BOON]
>>> Department of Biology
>>> Appalachian State University
>>> Boone, North Carolina U.S.A.
>>> -- AND --
>>> Communications Manager
>>> Southeast Regional Network of Expertise and Collections
>>> sernec.org
>>>
>>> 36.214177, -81.681480 +/- 3103 meters
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>>
>
>
>
> --
> Michael Denslow
>
> I.W. Carpenter Jr. Herbarium [BOON]
> Department of Biology
> Appalachian State University
> Boone, North Carolina U.S.A.
> -- AND --
> Communications Manager
> Southeast Regional Network of Expertise and Collections
> sernec.org
>
> 36.214177, -81.681480 +/- 3103 meters
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating a "shifted" month (one that starts not on the first of each month but on another date)

2011-05-19 Thread Dimitri Liakhovitski

Thanks a lot, Jim.
This is exactly what I was looking for!
Dimitri

On Thu, May 19, 2011 at 12:11 PM, jim holtman  wrote:
> try this:
>
> mydf<-data.frame(mydate=seq(as.Date("2011-01-01"), length = 92, by = "day"))
> # add 'day' to the dataframe
> mydf$day <- format(mydf$mydate, "%d")
> mydf$newfactor <- cumsum(mydf$day == '20')
> mydf
>
> On Thu, May 19, 2011 at 10:51 AM, Dimitri Liakhovitski
>  wrote:
>> Hello!
>> I have a data frame with dates. I need to create a new "month" that
>> starts on the 20th of each month - because I'll need to aggregate my
>> data later by that "shifted" month.
>> I wrote the code below and it works. However, I was wondering if there
>> is some ready-made function in some package - that makes it
>> easier/more elegant?
>> Thanks a lot!
>>
>> # Example data:
>> mydf<-data.frame(mydate=seq(as.Date("2011-01-01"), length = 92, by = "day"))
>> (mydf)
>>
>> ### Creating a new variable that has one value before
>> ### the 20th of each month and next value after it
>>
>> mydf$daynum<-as.numeric(format(mydate,"%d"))
>> library(zoo)
>> mydf$yearmon<-as.yearmon(mydf$mydate)
>> (mydf); str(mydf)
>>
>> mydf$newfactor<-NA
>> for(i in unique(mydf$yearmon)){ # looping through "yearmon" (important
>> because true data has many years of data)
>>        tempdf<-mydf[mydf$yearmon == i,]
>>        which.month<-which(unique(mydf$yearmon)==i)
>>        tempdf$newfactor[tempdf$daynum<20]<-which.month
>>        tempdf$newfactor[tempdf$daynum>19]<-(which.month+1)
>>        mydf[mydf$yearmon == i,]<-tempdf
>> }
>> (mydf)
>>
>> --
>> Dimitri Liakhovitski
>> Ninah Consulting
>> www.ninah.com
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
>



-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] dbetagen function in mc2d package

2011-05-19 Thread krusty the klown

Hi all,
I found this useful package for generalized beta, yet the function that
calculates its density leaves me puzzled, especially when I plotted it:

plot(function(y) dbetagen(y,4,1.2,min=0,max=40),xlim=c(0,40))

The area between the density curve and the x axis does not seem to measure
1!!!

--
View this message in context: 
http://r.789695.n4.nabble.com/dbetagen-function-in-mc2d-package-tp3535411p3535411.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating a "shifted" month (one that starts not on the first of each month but on another date)

2011-05-19 Thread jim holtman

try this:

mydf<-data.frame(mydate=seq(as.Date("2011-01-01"), length = 92, by = "day"))
# add 'day' to the dataframe
mydf$day <- format(mydf$mydate, "%d")
mydf$newfactor <- cumsum(mydf$day == '20')
mydf

On Thu, May 19, 2011 at 10:51 AM, Dimitri Liakhovitski
 wrote:
> Hello!
> I have a data frame with dates. I need to create a new "month" that
> starts on the 20th of each month - because I'll need to aggregate my
> data later by that "shifted" month.
> I wrote the code below and it works. However, I was wondering if there
> is some ready-made function in some package - that makes it
> easier/more elegant?
> Thanks a lot!
>
> # Example data:
> mydf<-data.frame(mydate=seq(as.Date("2011-01-01"), length = 92, by = "day"))
> (mydf)
>
> ### Creating a new variable that has one value before
> ### the 20th of each month and next value after it
>
> mydf$daynum<-as.numeric(format(mydate,"%d"))
> library(zoo)
> mydf$yearmon<-as.yearmon(mydf$mydate)
> (mydf); str(mydf)
>
> mydf$newfactor<-NA
> for(i in unique(mydf$yearmon)){ # looping through "yearmon" (important
> because true data has many years of data)
>        tempdf<-mydf[mydf$yearmon == i,]
>        which.month<-which(unique(mydf$yearmon)==i)
>        tempdf$newfactor[tempdf$daynum<20]<-which.month
>        tempdf$newfactor[tempdf$daynum>19]<-(which.month+1)
>        mydf[mydf$yearmon == i,]<-tempdf
> }
> (mydf)
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting Variable Name into String

2011-05-19 Thread Bert Gunter

1. This won't work. The lagged variables have length one less than the
originals.

2. How about:

lagged_Q <- data.frame( lapply( QuarterlyData,diff))

You can then change the names in lagged_Q to something like
lagged_originalName via paste() if you like.

3. I strongly suspect that none of this is necessary or wise: R  has
numerous time series modeling and graphical capabilities that handle
this automatically. I suggest you first  check the time series Task
View on CRAN to see if something there already does what you want.

-- Bert

On Thu, May 19, 2011 at 8:05 AM, Paolo Rossi
 wrote:
> Hello,
>
> I would like to create lagged and delta variables from a set of variables
> and then add them to a dataframe
>
> Suppose that GDPPcSa is a variable. I would like to be able to do this
>
> QuarterlyData$D1GdpPcSa = diff(GDPPcSa , 1)
>
> in an automated fashion so that I loop over Quartely data to compute the
> first difference  of its variables and add them to the dataframe.
>
> .It would be great to get a way to create the string "D1GdpPcSa" knowing
> that the name of the var is GdpPcSa. Then I can add the varibale  D1GdpPcSa
> to the dataframe and work on its names attribute.
>
> Thanks
>
> Paolo
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] not using attach() but in one case ....

2011-05-19 Thread Prof Brian Ripley

On Thu, 19 May 2011, Liaw, Andy wrote:

From: Prof Brian Ripley

Hmm, load() does have an 'envir' argument.  So you could simply use
that and with() (which is pretty much what attach() does internally).

If people really wanted a lazy approach, with() could be extended to
allow file names (as attach does).

I'm not sure if laziness like this should be encouraged.

As you can tell from my wording, I was not (and indeed consider the 
change to attach() to accept .rda files to have been a mistake).

If I may bring up another "black hole":  IMHO the formula interface
allows too much flexibility (perhaps to allow some laziness?) that
beginners and even non-beginners fall into its various traps a bit too
often, and sometimes not even aware of it.  It would be great if there's
a way to (optionally?) limit the scope of where a formula looks for
variables.

Note, there is no such thing as 'the formula interface'.  There are 
several, but I expect you mean model.frame.default, for that is what 
is most commonly used to collect together the variables from which to 
interpret a formula.  Now in my view model.frame suffers from having 
been designed for S's scoping rules and subverted for R by users of 
lexical scoping.  So it has to work for both approaches (or break even 
more legacy code than the 1.2.0 changes did).   In my experience the 
most common trap is to pick up variables in the workspace when 'data' 
(or 'newdata') is supplied, and that is unavoidable if you allow 
lexical scoping as the workspace is almost always in the environment 
of the formula.  You can change the latter of course, but I suspect 
that in 99.9% of cases if a user supplies 'data' he intends to get all 
the variables from 'data'.

This is not really the place for such a disussion, so if you want to 
pursue it please restart it on R-devel.

Just my $0.02...

Andy

On Thu, 19 May 2011, Martin Maechler wrote:

[modified 'Subject' on purpose;
Good mail readers will still thread correctly, using the

'References'

and 'In-Reply-To' headers, however, unfortunately,
in my limited experience, good mail readers seem to

disappear more and more ..

]

Peter Ehlers 
on Tue, 17 May 2011 06:08:30 -0700 writes:

  > On 2011-05-17 02:22, Timothy Bates wrote:
  >> Dear Bryony: the suggestion was not to change the name of
  >> the data object, but to explicitly tell glm.nb what
  >> dataset it should look in to find the variables you
  >> mention in the formula.
  >>
  >> so the salient difference is:
  >>
  >> m1<- glm.nb(Cells ~ Cryogel*Day, data = side)
  >>
  >> instead of
  >>
  >> attach(side) m1<- glm.nb(Cells ~ Cryogel*Day)
  >>
  >> This works for other functions also, but not uniformly as
  >> yet (how I wish it did and I could say hist(x, data=side)
  >> Instead of hist(side$x)
  >>
  >> this inconsistency encourages the need for attach()

  > Only if the user hasn't yet been introduced to the with()
  > function, which is linked to on the ?attach page.

  > Note also this sentence from the ?attach page:
  > " attach can lead to confusion."

  > I can't remember the last time I needed attach().
  > Peter Ehlers

Well, then you don't know  *THE ONE* case where modern users of
R should use attach() ... as I have been teaching for a while,
but seem not have got enought students listening ;-) ...

 ---  Use it instead of load()  {for save()d R objects} ---

The advantage of attach() over load() there is that loaded
objects (and there maye be a bunch!), are put into a separate
place in the search path and will not accidentally overwrite
objects in the global "workspace".

Of course, there are still quite a few situations {e.g. in
typical BATCH use of R for simulations, or Sweaving, etc} where
load() is good enough, and the extras of using attach() are not
worth it.

But the unconditional  "do not use attach()"
is not quite ok,
at least not when you talk to non-beginners.

Martin Maechler, ETH Zurich

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Notice:  This e-mail message, together with any attachme...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.a

Re: [R] Shrink file size of pdf graphics


On 19/05/2011 11:14 AM, Layman123 wrote:

Hi everyone,

My data consists of a system of nearly 75000 roads, available as a
shapefile. When I plot the road system, by adding the individual roads with
'lines' and store it as a pdf-file with 'pdf' I get a file of size 13 MB.
This is way too large to add it in my LaTeX-document, because there will be
some more graphics of this type.
Now I'm curious to learn wheter there is a possibility in R to shrink the
file size of this graphic? I merely need it in a resolution so that it looks
"smooth" when printed out. I don't know much about the storage of R
graphics, but maybe there is a way to change the way the file is stored
perhaps as a pixel image?



There are several possibilities.  You can use a bitmapped device (e.g. 
png()) to save the image; pdflatex can include those.


You can compress the .pdf file using an external tool like pdftk (or do 
it internally in R 2.14.x, coming soon).


There are probably others...

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Shrink file size of pdf graphics

2011-05-19 Thread Prof Brian Ripley


This was answered on this list a few days ago.  See

https://stat.ethz.ch/pipermail/r-help/2011-May/278029.html

On Thu, 19 May 2011, Layman123 wrote:


Hi everyone,

My data consists of a system of nearly 75000 roads, available as a
shapefile. When I plot the road system, by adding the individual roads with
'lines' and store it as a pdf-file with 'pdf' I get a file of size 13 MB.
This is way too large to add it in my LaTeX-document, because there will be
some more graphics of this type.
Now I'm curious to learn wheter there is a possibility in R to shrink the
file size of this graphic? I merely need it in a resolution so that it looks
"smooth" when printed out. I don't know much about the storage of R
graphics, but maybe there is a way to change the way the file is stored
perhaps as a pixel image?


Of course!  What does ?Devices show on your unstated system and 
unstated R version?  Most likely it will contain something like


• ‘png’ PNG bitmap device

• ‘jpeg’ JPEG bitmap device

• ‘bmp’ BMP bitmap device

• ‘tiff’ TIFF bitmap device

• ‘bitmap’ bitmap pseudo-device via ‘Ghostscript’ (if
  available).


Thank you very much in advance! Any tips are appreciated much!


Tip: follow the footer of this message and do your homework before 
posting.




Regards
Roman

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] not using attach() but in one case ....


On 19/05/2011 10:34 AM, Liaw, Andy wrote:

From: Prof Brian Ripley
>
>  Hmm, load() does have an 'envir' argument.  So you could simply use
>  that and with() (which is pretty much what attach() does internally).
>
>  If people really wanted a lazy approach, with() could be extended to
>  allow file names (as attach does).

I'm not sure if laziness like this should be encouraged.

If I may bring up another "black hole":  IMHO the formula interface
allows too much flexibility (perhaps to allow some laziness?) that
beginners and even non-beginners fall into its various traps a bit too
often, and sometimes not even aware of it.  It would be great if there's
a way to (optionally?) limit the scope of where a formula looks for
variables.


I think there is:  put the variables in an environment that doesn't have 
parents covering everything visible, and use that as the environment of 
the formula.


For example, you could follow

x <- 1:10
y <- rnorm(10)
z <- rnorm(10)
f <- y ~ x + z

with

e <- new.env(parent=baseenv())
e$x <- x
e$y <- y
environment(f) <- e

and you'll get a failure with

lm(f)

because it can't find z.  Obviously this could be wrapped in a 
friendlier function if you really wanted it.


Duncan Murdoch





Just my $0.02...

Andy

>  On Thu, 19 May 2011, Martin Maechler wrote:
>
>  >  [modified 'Subject' on purpose;
>  >  Good mail readers will still thread correctly, using the
>  'References'
>  >  and 'In-Reply-To' headers, however, unfortunately,
>  >  in my limited experience, good mail readers seem to
>  disappear more and more ..
>  >  ]
>  >
>  >>  Peter Ehlers
>  >>  on Tue, 17 May 2011 06:08:30 -0700 writes:
>  >
>  > >  On 2011-05-17 02:22, Timothy Bates wrote:
>  > >>  Dear Bryony: the suggestion was not to change the name of
>  > >>  the data object, but to explicitly tell glm.nb what
>  > >>  dataset it should look in to find the variables you
>  > >>  mention in the formula.
>  > >>
>  > >>  so the salient difference is:
>  > >>
>  > >>  m1<- glm.nb(Cells ~ Cryogel*Day, data = side)
>  > >>
>  > >>  instead of
>  > >>
>  > >>  attach(side) m1<- glm.nb(Cells ~ Cryogel*Day)
>  > >>
>  > >>  This works for other functions also, but not uniformly as
>  > >>  yet (how I wish it did and I could say hist(x, data=side)
>  > >>  Instead of hist(side$x)
>  > >>
>  > >>  this inconsistency encourages the need for attach()
>  >
>  > >  Only if the user hasn't yet been introduced to the with()
>  > >  function, which is linked to on the ?attach page.
>  >
>  > >  Note also this sentence from the ?attach page:
>  > >  " attach can lead to confusion."
>  >
>  > >  I can't remember the last time I needed attach().
>  > >  Peter Ehlers
>  >
>  >  Well, then you don't know  *THE ONE* case where modern users of
>  >  R should use attach() ... as I have been teaching for a while,
>  >  but seem not have got enought students listening ;-) ...
>  >
>  >   ---  Use it instead of load()  {for save()d R objects} ---
>  >
>  >  The advantage of attach() over load() there is that loaded
>  >  objects (and there maye be a bunch!), are put into a separate
>  >  place in the search path and will not accidentally overwrite
>  >  objects in the global "workspace".
>  >
>  >  Of course, there are still quite a few situations {e.g. in
>  >  typical BATCH use of R for simulations, or Sweaving, etc} where
>  >  load() is good enough, and the extras of using attach() are not
>  >  worth it.
>  >
>  >  But the unconditional  "do not use attach()"
>  >  is not quite ok,
>  >  at least not when you talk to non-beginners.
>  >
>  >  Martin Maechler, ETH Zurich
>
>  -- 
>  Brian D. Ripley,  rip...@stats.ox.ac.uk

>  Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>  University of Oxford, Tel:  +44 1865 272861 (self)
>  1 South Parks Road, +44 1865 272866 (PA)
>  Oxford OX1 3TG, UKFax:  +44 1865 272595
>
>  __
>  R-help@r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide
>  http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] within()



On May 19, 2011, at 9:32 AM, Timothy Bates wrote:



Likewise it would help reduce confusion when
plot(~Source, data=Df) # works
# but
boxplot(~Source, data=Df)
# Error in boxplot.formula(~Source, data = Df) :
#  'formula' missing or incorrect

The formula isn’t missing or illformed, it’s that boxplot requires  
formulae to have something on the left hand side.


It may not be "illformed" but the error message says instead that it  
is "incorrect". Read the help page for boxplot (which would be the  
first thing a right-thinking person would do upon getting such a  
message) and it clearly implies that there needs to be a LHS  
component. I would rather the R Core spend its time on other areas.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Shrink file size of pdf graphics

2011-05-19 Thread Layman123

Hi everyone,

My data consists of a system of nearly 75000 roads, available as a
shapefile. When I plot the road system, by adding the individual roads with
'lines' and store it as a pdf-file with 'pdf' I get a file of size 13 MB.
This is way too large to add it in my LaTeX-document, because there will be
some more graphics of this type.
Now I'm curious to learn wheter there is a possibility in R to shrink the
file size of this graphic? I merely need it in a resolution so that it looks
"smooth" when printed out. I don't know much about the storage of R
graphics, but maybe there is a way to change the way the file is stored
perhaps as a pixel image?

Thank you very much in advance! Any tips are appreciated much!

Regards
Roman

--
View this message in context: 
http://r.789695.n4.nabble.com/Shrink-file-size-of-pdf-graphics-tp3536042p3536042.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Style Guide -- Was Post-hoc tests in MASS using glm.nb

2011-05-19 Thread Spencer Graves

Is a list like Henrik's (below) available someplace on r-project.org or 
on your local CRAN mirror?  If no, where do you think it might most 
conveniently fit?  What about "www.r-project.org/other-docs.html"? 
Perhaps add a bullet right under the link to "DevCheatSheet has a 
collection of R Reference Cards" something like "For a list of R 
Programming Style Guides, see ...", with a link to a page giving 
Henrick's list.  Or make it a CRAN Task View on "R Programming Style 
Guides"?



Spencer


On 5/18/2011 7:17 PM, Henrik Bengtsson wrote:

On Wed, May 18, 2011 at 6:28 PM, David Scott  wrote:

  Another style guide is at:
http://www1.maths.lth.se/help/R/RCC/

Listed as a first draft and dated 2005, but still worth a read. Has some
references also.

That URL obsolete (I need to have it removed) - a more recent/stable
URL is [5] below.

LIST OF CONVENTIONS/STYLES FOR R:

[1] R coding standards in the R Internals manual
   http://www.cran.r-project.org/doc/manuals/R-ints.html#R-coding-standards

[2] Bioconductor coding standards
   http://wiki.fhcrc.org/bioc/Coding_Standards

[3] Google R style
[http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html]

[4] R style guide by Hadley Wickham (based on [3])
   http://had.co.nz/stat405/resources/r-style-guide.html

[5] R Coding Conventions (RCC) - a draft by Henrik Bengtsson
   http://aroma-project.org/developers/RCC

[6] The Aroma R Coding Conventions (Aroma RCC) by Henrik Bengtsson
(based on [5])
   http://aroma-project.org/developers/AromaRCC

Note that there are often different objectives driving the different
coding styles, which is why it makes little sense to debate certain
items. As an example, one convention may favor portability to another
language limiting it to not use S4 (just an example).

/Henrik


I think I recall Hadley having a style guide which he requested his students
followed, but I didn't like it too much (sorry Hadley) .

I am with Bill that style guides should be consulted and their
recommendations considered, but it is personal preference as to which rules
one accepts. I don't find it objectionable if someone has written in a style
I don't particularly like, but it is objectionable if no thought has been
given to programming style.

David Scott


On 19/05/11 10:26, bill.venab...@csiro.au wrote:

Hi Bert,

I think people should know about the Google Sytle Guide for R because, as
I said, it represents a thoughtful contribution to the debate.  Most of its
advice is very good (meaning I agree with it!) but some is a bit too much
(for example, the blanket advice never to use S4 classes and methods -
that's just resisting progress, in my view).  The advice on using<- for the
(normal) assingment operator rather than = is also good advice, (according
to me), but people who have to program in both C and R about equally often
may find it a bit tedious.  We can argue over that one.

I suggest it has a place in the R FAQ but with a suitable warning that
this is just one view, albeit a thougtful one.  I don't think it need be
included in the posting guide, though.  It would take away some of the fun.
  :-)

Bill Venables.

-Original Message-
From: Bert Gunter [mailto:gunter.ber...@gene.com]
Sent: Wednesday, 18 May 2011 11:47 PM
To: Venables, Bill (CMIS, Dutton Park)
Cc: r-help@r-project.org
Subject: R Style Guide -- Was Post-hoc tests in MASS using glm.nb

Thanks Bill. Do you and others think that a link to this guide (or
another)should be included in the Posting Guide and/or R FAQ?

-- Bert

On Tue, May 17, 2011 at 4:07 PM,wrote:

Amen to all of that, Bert.  Nicely put.  The google style guide (not
perfect, but a thoughtful contribution on these kinds of issues, has
avoiding attach() as its very first line.  See
http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html)

I would add, though, that not enough people seem yet to be aware of
within(...), a companion of with(...) in a way, but used for modifying data
frames or other kinds of list objects.  It should be seen as a more flexible
replacement for transform() (well, almost).

The difference between with() and within() is as follows:

with(data, expr, ...)

allows you to evaluate 'expr' with 'data' providing the primary source
for variables, and returns *the evaluated expression* as the result.  By
contrast

within(data, expr, ...)

again uses 'data' as the primary source for variables when evaluating
'expr', but now 'expr' is used to modify the varibles in 'data' and returns
*the modified data set* as the result.

I use this a lot in the data preparation phase of a project, especially,
which is usually the longest, trickiest, most important, but least discussed
aspect of any data analysis project.

Here is a simple example using within() for something you cannot do in
one step with transform():

polyData<- within(data.frame(x = runif(500)), {
  x2<- x^2
  x3<- x*x2
  b<- runif(4)
  eta<- cbind(1,x,x2,x3) %*% b
  y<- eta + rnorm(x, sd = 0.5)
  rm(b)
})

chec

[R] Converting Variable Name into String

2011-05-19 Thread Paolo Rossi

Hello,

I would like to create lagged and delta variables from a set of variables
and then add them to a dataframe

Suppose that GDPPcSa is a variable. I would like to be able to do this

QuarterlyData$D1GdpPcSa = diff(GDPPcSa , 1)

in an automated fashion so that I loop over Quartely data to compute the
first difference  of its variables and add them to the dataframe.

.It would be great to get a way to create the string "D1GdpPcSa" knowing
that the name of the var is GdpPcSa. Then I can add the varibale  D1GdpPcSa
to the dataframe and work on its names attribute.

Thanks

Paolo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Creating a "shifted" month (one that starts not on the first of each month but on another date)

2011-05-19 Thread Dimitri Liakhovitski

Hello!
I have a data frame with dates. I need to create a new "month" that
starts on the 20th of each month - because I'll need to aggregate my
data later by that "shifted" month.
I wrote the code below and it works. However, I was wondering if there
is some ready-made function in some package - that makes it
easier/more elegant?
Thanks a lot!

# Example data:
mydf<-data.frame(mydate=seq(as.Date("2011-01-01"), length = 92, by = "day"))
(mydf)

### Creating a new variable that has one value before
### the 20th of each month and next value after it

mydf$daynum<-as.numeric(format(mydate,"%d"))
library(zoo)
mydf$yearmon<-as.yearmon(mydf$mydate)
(mydf); str(mydf)

mydf$newfactor<-NA
for(i in unique(mydf$yearmon)){ # looping through "yearmon" (important
because true data has many years of data)
tempdf<-mydf[mydf$yearmon == i,]
which.month<-which(unique(mydf$yearmon)==i)
tempdf$newfactor[tempdf$daynum<20]<-which.month
tempdf$newfactor[tempdf$daynum>19]<-(which.month+1)
mydf[mydf$yearmon == i,]<-tempdf
}
(mydf)

-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help, please



On May 18, 2011, at 6:29 PM, Julio César Flores Castro wrote:


Hi,

I am using R 2.10.1 and I have a doubt. Do you know how many cases  
can R

handle?


I was able to handle (meaning do Cox proportional hazards work with  
the 'rms' package which adds extra memory overhead with a datadist  
object)  a 5.5 million rows by 100 columns dataframe without  
difficulty using 24 GB on a Mac (BSD UNIX kernel). I was running into  
performance slow downs related to paging out to virtual memory at 150  
columns, but after expanding to 32 GB can now handle 5.5 MM records  
with 200 columns without paging.




I want to use the library npmc but if I have more than 4,500 cases I  
get an
error message. If I use less than 4500 cases I don´t have problems  
with this

library.

Is there any way to increase the number of cases in order to use this
library.


64 bit OS, 64 bit R,  and more memory.

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] within()


On 19/05/2011 9:32 AM, Timothy Bates wrote:

The most interesting thing in this thread for me was "within()" - that is what 
I want 99% of the time, but I had only ever heard of with()

A real boon! Thanks Bill V!

The help for within gives this example for with(), which seems unnecessary, as 
glm supports data=

with(anorexia, {
 anorex.1<- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian)
 summary(anorex.1)
})

especially when this is easier to read:
   m<- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, 
data=anorexia)
   summary(m)

The Rd file also has (just?) one example for within… I suggest deleting the 
anorexia example so the within example comes first, and perhaps supplementing 
with

## Not run:
# More advanced example of usage (not run)
model1<- glm.nb(Cells ~ Cryogel*Day, data = myData)
myData2<- within(myData, Cryogel<- relevel(Cryogel, ref = "2"))
model2<- update(model1, data = myData1)
## End(Not run)


Also, the help page for transform does not back link to within… That would be 
helpful. It also uses an example with attach, which might encourage less good 
habits.

Best wishes, and thanks,
tim
PS   : How does one best contribute to help files?


One way is to post suggestions to the R-devel list.  For the suggestion 
to add the See also link, that's probably best (but no need to do it 
now, I'll add it).  The problem with posting suggestions to R-help is 
that they are more likely to get lost.


For more extensive suggestions, it's best if you actually do the work 
and post a patch.  Coming up with good examples is hard, and so often 
the response to a suggestion like "the example should be better" is "I 
agree", rather than actually changing it.  But if you put together an 
example that is obviously better, it's easy to just paste it in, and so 
more likely to happen.


In this particular case, I don't agree that your change is an 
improvement.  "Not run" examples are bad, because they are rarely 
tested.  A more compelling use of with() would be a nice change to the 
help page, if you want to put one together.




PPS: If R was to go to google summer of code, apart from helpful help, my 
wishlist would prioritise standardising all common functions to (where possible)
1. Have a formula interface
2. Support “data=“
3. Use a uniform na handling, with examples


R does have GSOC projects in the works.  A call for proposals was posted 
to the R-devel list back in February; you can read about it here:  
http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2011, with 
links to current discussion.



Diverse NA behavior catches most of our students out (now all three departments 
of our school using R exclusively at the PG level)
NA behaviour can be
na.rm= as in rowMeans(na.rm=T))
use= as in an cor(use=“every”)   # with well documented effects in 
?cor(), and all legal options  shown 
(everything|all.obs|complete.obs|na.or.complete|pairwise.complete.obs)
na.action=  as in t.test(na.action=“na.omit”) # with no examples in ?t.test, 
nor a list of legal values

Likewise it would help reduce confusion when
plot(~Source, data=Df) # works
# but
boxplot(~Source, data=Df)
# Error in boxplot.formula(~Source, data = Df) :
#  'formula' missing or incorrect

The formula isn’t missing or illformed, it’s that boxplot requires formulae to 
have something on the left hand side.



Tested fixes should be posted to R-devel.  Generally patches should be 
based on the R-devel (trunk) source, available at 
https://svn.r-project.org/R/trunk.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] not using attach() but in one case ....

2011-05-19 Thread Liaw, Andy

From: Prof Brian Ripley
> 
> Hmm, load() does have an 'envir' argument.  So you could simply use 
> that and with() (which is pretty much what attach() does internally).
> 
> If people really wanted a lazy approach, with() could be extended to 
> allow file names (as attach does).

I'm not sure if laziness like this should be encouraged.  

If I may bring up another "black hole":  IMHO the formula interface
allows too much flexibility (perhaps to allow some laziness?) that
beginners and even non-beginners fall into its various traps a bit too
often, and sometimes not even aware of it.  It would be great if there's
a way to (optionally?) limit the scope of where a formula looks for
variables.
 
Just my $0.02...

Andy

> On Thu, 19 May 2011, Martin Maechler wrote:
> 
> > [modified 'Subject' on purpose;
> > Good mail readers will still thread correctly, using the 
> 'References'
> > and 'In-Reply-To' headers, however, unfortunately,
> > in my limited experience, good mail readers seem to 
> disappear more and more ..
> > ]
> >
> >> Peter Ehlers 
> >> on Tue, 17 May 2011 06:08:30 -0700 writes:
> >
> >> On 2011-05-17 02:22, Timothy Bates wrote:
> >>> Dear Bryony: the suggestion was not to change the name of
> >>> the data object, but to explicitly tell glm.nb what
> >>> dataset it should look in to find the variables you
> >>> mention in the formula.
> >>>
> >>> so the salient difference is:
> >>>
> >>> m1<- glm.nb(Cells ~ Cryogel*Day, data = side)
> >>>
> >>> instead of
> >>>
> >>> attach(side) m1<- glm.nb(Cells ~ Cryogel*Day)
> >>>
> >>> This works for other functions also, but not uniformly as
> >>> yet (how I wish it did and I could say hist(x, data=side)
> >>> Instead of hist(side$x)
> >>>
> >>> this inconsistency encourages the need for attach()
> >
> >> Only if the user hasn't yet been introduced to the with()
> >> function, which is linked to on the ?attach page.
> >
> >> Note also this sentence from the ?attach page:
> >> " attach can lead to confusion."
> >
> >> I can't remember the last time I needed attach().
> >> Peter Ehlers
> >
> > Well, then you don't know  *THE ONE* case where modern users of
> > R should use attach() ... as I have been teaching for a while,
> > but seem not have got enought students listening ;-) ...
> >
> >  ---  Use it instead of load()  {for save()d R objects} ---
> >
> > The advantage of attach() over load() there is that loaded
> > objects (and there maye be a bunch!), are put into a separate
> > place in the search path and will not accidentally overwrite
> > objects in the global "workspace".
> >
> > Of course, there are still quite a few situations {e.g. in
> > typical BATCH use of R for simulations, or Sweaving, etc} where
> > load() is good enough, and the extras of using attach() are not
> > worth it.
> >
> > But the unconditional  "do not use attach()"
> > is not quite ok,
> > at least not when you talk to non-beginners.
> >
> > Martin Maechler, ETH Zurich
> 
> -- 
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Does a data frame have indexes for columns?

2011-05-19 Thread Santosh Srinivas

Check data.table ... It benefits a lot from indexing data in data frames

On Thu, May 19, 2011 at 7:41 PM, xiagao1982  wrote:
> Hi all,
>
> I wonder if I can create indexes for columns in a data frame to speed up data 
> selection by matching column values, just like indexes for large tables in a 
> relational database? Thank you!
>
>
>
>
> xiagao1982
> 2011-05-19
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Does a data frame have indexes for columns?

2011-05-19 Thread xiagao1982

Hi all,

I wonder if I can create indexes for columns in a data frame to speed up data 
selection by matching column values, just like indexes for large tables in a 
relational database? Thank you!




xiagao1982
2011-05-19

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Style Guide -- Was Post-hoc tests in MASS using glm.nb

2011-05-19 Thread Bert Gunter

Thanks Martin,

Your points are, of course, well taken. Nevertheless, I still think it
might be useful to put a link or links to one or more style guides in
the FAQ with a comment to the effect that these are various
recommended ways to help write better, more readable code. Something
like:

--
Q: What are best practices for R coding style?

A: There is no simple answer to this question, as programming style is
legitimately a personal choice and may depend on the nature of the
programming task. However useful guidelines and alternatives can be
found at .

---

One important point: The links need to be reasonably stable, and this
could be problematic.

However, I am happy to defer to R Core and experienced R programmers
like yourself on these matters.

Cheers to all,
Bert

On Thu, May 19, 2011 at 3:26 AM, Martin Maechler
 wrote:
>
>    BertG> Thanks Bill. Do you and others think that a link to
>    BertG> this guide (or another)should be included in the
>    BertG> Posting Guide and/or R FAQ?
>
> Hmm, that guide is Google's work, and is probably quite good if
> you have a group of R programmers in the same company,
> but e.g., has not been published in collaboration with the R core team,
> and actually somewhat differs from our own .. much less formal
> and not officially laid down styles
> {yes: "s".. but we still have a few parts we agree on...}
>
> 2nd problem with any style guide: "Base R" already comes
> with several thousand of functions, classes and other objects, which
> have grown from more than 20 years of S, S+ and then R history,
> and most things cannot feasibly be changed now (or could, say 5
> years ago), for back compatibility reasons.
>
> Further,... more philosophically:
> For many of us, programming (R or otherwise) is considered
> a creative activity to quite some extent, and creativity can be
> crushed by too rigid rules.  I'd state that cultural history
> shows that human culture implicitly follows many rules, but it
> is (almost) only interesting because the are enough exceptions
> to those rules.
>
> Martin Maechler
>       @ ETH Zurich and R Core Team since its inception
>       but speaking entirely for myself ..
>

-- 
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] McKinsey Global Institute Report on Big Data has potential implications for R

2011-05-19 Thread Dimitri Liakhovitski

Thought I'd share with the community: A very interesting (free) report
by McKinsey Global Institute about the importance of Big Data for the
present/future economy:

http://www.mckinsey.com/mgi/publications/big_data/index.asp

Hoping R will be an important player in this area.

-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with optim()

2011-05-19 Thread Jonathan Daily

I cc'd r-help to get this message back on the board.

Again, you haven't really answered my questions. Comments inline.

On Thu, May 19, 2011 at 9:45 AM, chirine wolley
 wrote:
> First, thank you for ur response...
> Actually I didn't write the entire code ...X, Y and p_tilde are of
> course all defined at the beginning of my code.

And without them, your code example is not very enlightening, since it
doesn't run.

> Normally the optimal values that I get should let me create a model for
> prediction. However the model currently obtained is not good at all

You still have provided neither a) what you expect nor b) what optim()
is returning. What context is normal?

> And I think that the fact  it gives me the exactly same values if I
> change the function to maximize shows that there's a problem in my code

While I can't say if this makes a difference or if you accounted for
it, optim minimizes by default as outlined in the ?optim page.

>
>> Date: Thu, 19 May 2011 09:29:02 -0400
>> Subject: Re: [R] problem with optim()
>> From: biomathjda...@gmail.com
>> To: wolley.chir...@hotmail.com
>> CC: r-help@r-project.org
>>
>> What do you mean when you say "wrong results"? What do you expect for
>> the output? Your code doesn't work for me because it references X in
>> places and X is not defined.
>>
>> Have you tested your functions to make sure they return reasonable values?
>>
>> On Thu, May 19, 2011 at 9:17 AM, chirine wolley
>>  wrote:
>> >
>> > Dear R-users,
>> >
>> > I would like to maximize the function g above which depends on 4
>> > parameters (2 vectors, 1 real number, and 1 matrix)  using optim() and BFGS
>> > method. Here is my code:
>> >
>> > # fonction to maximize
>> >
>> > g=function(x)
>> > {
>> > x1 = x[1:ncol(X)]
>> > x2 = x[(ncol(X)+1)]
>> > x3 =
>> > matrix(x[(ncol(X)+2):(ncol(X)+1+ncol(X)*ncol(Y))],nrow=ncol(X),ncol=ncol(Y))
>> > x4 = x[(ncol(X)+1+ncol(X)*ncol(Y)+1):length(x)]
>> > res1=rep(0,nrow(X))
>> > res2=matrix(0,nrow=nrow(X),ncol=ncol(Y))
>> > log.res2=matrix(0,nrow=nrow(X),ncol=ncol(Y))
>> > res2.b=rep(0,nrow(X))
>> > res3 = rep(0,nrow(X))
>> > res3.b = rep(0,nrow(X))
>> > for (i in 1:nrow(X))
>> > {
>> > res1[i]=1/(1+exp(-t(x1)%*%X[i,]-x2))
>> > for (t in 1:ncol(Y))
>> > {
>> > res2[i,t] =
>> > ((1-(1+exp(-t(x3[,t])%*%X[i,]-x4[t]))^(-1))^(abs(Y[i,t]-Yb[i])))*(((1+exp(-t(x3[,t])%*%X[i,]-x4[t]))^(-1))^(1-abs(Y[i,t]-Yb[i])))
>> > log.res2[i,t]=log(res2[i,t])
>> > res2.b[i]=res2.b[i]+log.res2[i,t]
>> > }
>> > res3[i] = p_tilde[i]*log(res1[i])
>> > res3.b[i] = p_tilde[i]*(res2.b[i])
>> > }
>> > -(ncol(Y)*sum(res3)+sum(res3.b))
>> >
>> > }
>> >
>> > # Gradiants:
>> >
>> > gr=function(x)
>> > {
>> > x1 = x[1:ncol(X)]
>> > x2 = x[(ncol(X)+1)]
>> > x3 =
>> > matrix(x[(ncol(X)+2):(ncol(X)+1+ncol(X)*ncol(Y))],nrow=ncol(X),ncol=ncol(Y))
>> > x4 = x[(ncol(X)+1+ncol(X)*ncol(Y)+1):length(x)]
>> > gr1 = rep(0,ncol(X))
>> > gr4 = rep(0,ncol(Y))
>> > gr3 = matrix(0,nrow=ncol(X),ncol=ncol(Y))
>> > gr1.b = matrix(0,nrow=nrow(X),ncol=ncol(X))
>> > gr2.b = rep(0,nrow(X))
>> > eta = matrix(0,nrow=nrow(X),ncol=ncol(Y))
>> > d.eta.3 = array(0,dim=c(nrow(X),ncol(X),ncol(Y)))
>> > d.eta.4 = matrix(0,nrow=nrow(X),ncol=ncol(Y))
>> > gr3.b1 = array(0,dim=c(nrow(X),ncol(X),ncol(Y)))
>> > gr4.b1 = matrix(0,nrow=nrow(X),ncol=ncol(Y))
>> >
>> > #Gradiant of alpha and beta
>> >
>> > for (i in 1:nrow(X))
>> > {
>> > gr1.b[i,] =
>> > (2*p_tilde[i]-1)*((exp(-t(x1)%*%X[i,]-x2)*X[i,])/(1+exp(-t(x1)%*%X[i,]-x2))^2)
>> > gr2.b[i] =
>> > (2*p_tilde[i]-1)*((exp(-t(x1)%*%X[i,]-x2))/(1+exp(-t(x1)%*%X[i,]-x2))^2)
>> > }
>> > for (j in 1:ncol(X))
>> > {
>> > gr1[j] = sum(gr1.b[,j])
>> > }
>> > gr2 = sum(gr2.b)
>> >
>> >
>> > #Gradiant de w et gamma
>> > for (i in 1:nrow(X))
>> > {
>> > for (t in 1:ncol(Y))
>> > {
>> > eta[i,t] = 1/(1+exp(-t(x3[,t])%*%X[i,]-x4[t]))
>> > d.eta.3[i,,t] = eta[i,t]*(1-eta[i,t])*X[i,]
>> > d.eta.4[i,t] = eta[i,t]*(1-eta[i,t])
>> > gr3.b1[i,,t] =
>> > p_tilde[i]*((-abs(Y[i,t]-Yb[i]))*(1-eta[i,t])^(-1)+(1-abs(Y[i,t]-Yb[i]))*
>> > (eta[i,t])^(-1))*d.eta.3[i,,t]
>> > gr4.b1[i,t] =
>> > p_tilde[i]*((-abs(Y[i,t]-Yb[i]))*(1-eta[i,t])^(-1)+(1-abs(Y[i,t]-Yb[i]))*
>> > (eta[i,t])^(-1))*d.eta.4[i,t]
>> > }
>> > }
>> > for (t in 1:ncol(Y))
>> > {
>> > for (j in 1:ncol(X))
>> > {
>> > gr3[j,t] = sum(gr3.b1[,j,t])
>> > }
>> > gr4[t] = sum(gr4.b1[,t])
>> > }
>> > c(-gr1,-gr2,-gr3,-gr4)
>> > }
>> >
>> > opt = optim(c(alpha[,c+1],beta[c+1],w,gamma),g,gr,method="BFGS")
>> >
>> > The problem is that it gives me wrong results, and I have noticed that
>> > if I change my function to maximize (for example if, instead of
>> > -(ncol(Y)*sum(res3)+sum(res3.b)), I try to maximise -(ncol(Y)*sum(res3)), 
>> > it
>> > gives me the exactly same results...which is not possible!
>> > So maybe I am using optim() in a wrong way...Does someone have an idea
>> > what could be wrong in my code ?
>> >
>> > Thank you very much in advance
>> >        [[alternative HTML version deleted]]
>> >
>> >

Re: [R] balanced panel data

2011-05-19 Thread Cecilia Carmo

It works!

 

Thank you.

 

CecÃlia

 

De: Scott Chamberlain [mailto:scttchamberla...@gmail.com] 
Enviada: quinta-feira, 19 de Maio de 2011 13:40
Para: Cecilia Carmo
Cc: r-help@r-project.org
Assunto: Re: [R] balanced panel data

 

# If you know how many years are needed you could do this

makenewtable <- function(x, years) {

xlist <- split(x, x$firm)

new <- list()

dat <- lapply(xlist, function(z) if(length(unique(z$year)) == years) {new 
<- z} )

dat_ <- do.call(rbind, dat)

return(dat_)

}

makenewtable(finaldata, 5)

 

 

Scott

On Thursday, May 19, 2011 at 6:24 AM, Cecilia Carmo wrote:

I have a dataframe with many firm-year observations and many variables. 

Not all firms have information for all the years.

I want another dataframe with only those firms that have information all
years.

This is, I want a balanced panel data, but with the maximum number of years.

In my reprocucible example I want to keep firms 1,2 and 3 (period 2000 to
2004). 



I need your help to create a code for this.



Thank you very much,



CecÃlia Carmo

(Universidade de Aveiro)





#My reproducible example:

firm<-sort(rep(1:3,5),decreasing=F)

year<-rep(2000:2004,3)

X<-rnorm(15)

data1<-data.frame(firm,year,X)

data1



firm<-sort(rep(4:6,3),decreasing=F)

year<-rep(2001:2003,3)

X<-rnorm(9)

data2<-data.frame(firm,year,X)

data2



finaldata<-rbind(data1,data2)

finaldata


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix help (first occurrence of variable in column)

2011-05-19 Thread Michael Denslow

On Wed, May 18, 2011 at 9:49 PM, jim holtman  wrote:
> Is this what you were after:
>
>> mdat <- matrix(c(1,0,1,1,1,0), nrow = 2, ncol=3, byrow=TRUE,
> +               dimnames = list(c("T1", "T2"),
> +                               c("sp.1", "sp.2", "sp.3")))
>>
>> mdat
>   sp.1 sp.2 sp.3
> T1    1    0    1
> T2    1    1    0
>> # do 'rle' on each column and see if it is length >1 and starts with zero
>> mdat.df <- as.data.frame(mdat)
>> new.spec <- sapply(mdat.df, function(x){
> +     x.rle <- rle(x)
> +     (length(x.rle$values) > 1) & (x.rle$values[1L] == 0)
> + })
>> names(mdat.df)[new.spec]
> [1] "sp.2"
>>

Thanks for your reply!
This is close to what I want, but I think it only works if there is
two rows. My actual data could have up to 8 rows (time samples).

An example with 4 rows:

mdat3 <- matrix(c(1,0,0,1,0,1,0,0,1,0,1,1,1,0,0,1,0,1,1,1), nrow = 4,
ncol=5, byrow=TRUE,
   dimnames = list(c("T1", "T2",'T3','T4'),
   c("sp.1", "sp.2", "sp.3","sp.4","sp.5")))

mdat3

mdat.df <- as.data.frame(mdat3)
new.spec <- sapply(mdat.df, function(x){
x.rle <- rle(x)
(length(x.rle$values) > 1) & (x.rle$values[1L] == 0)
})

names(mdat.df)[new.spec]

It should say sp.5 since all the other species have occurred in other
samples. Any further help would be much appreciated.


>
> On Wed, May 18, 2011 at 9:37 AM, Michael Denslow
>  wrote:
>> Dear R help,
>> Apologies for the less than informative subject line. I will do my
>> best to describe my problem.
>>
>> Consider the following matrix:
>>
>> mdat <- matrix(c(1,0,1,1,1,0), nrow = 2, ncol=3, byrow=TRUE,
>>               dimnames = list(c("T1", "T2"),
>>                               c("sp.1", "sp.2", "sp.3")))
>>
>> mdat
>>
>> In my actual data I have time (rows) and species occurrences (0/1
>> values, columns). I want to count the number of new species that occur
>> at a given time sample. For the matrix above the answer would be 1.
>>
>> Is there a simple way to figure out if the species has never occurred
>> before and then sum them up?
>>
>> Thanks in advance,
>> Micheal
>>
>> --
>> Michael Denslow
>>
>> I.W. Carpenter Jr. Herbarium [BOON]
>> Department of Biology
>> Appalachian State University
>> Boone, North Carolina U.S.A.
>> -- AND --
>> Communications Manager
>> Southeast Regional Network of Expertise and Collections
>> sernec.org
>>
>> 36.214177, -81.681480 +/- 3103 meters
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
>



-- 
Michael Denslow

I.W. Carpenter Jr. Herbarium [BOON]
Department of Biology
Appalachian State University
Boone, North Carolina U.S.A.
-- AND --
Communications Manager
Southeast Regional Network of Expertise and Collections
sernec.org

36.214177, -81.681480 +/- 3103 meters

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] within()

2011-05-19 Thread Timothy Bates

The most interesting thing in this thread for me was "within()" - that is what 
I want 99% of the time, but I had only ever heard of with()

A real boon! Thanks Bill V!

The help for within gives this example for with(), which seems unnecessary, as 
glm supports data=

with(anorexia, {
anorex.1 <- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian)
summary(anorex.1)
})

especially when this is easier to read:
  m <- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, 
data=anorexia)
  summary(m)

The Rd file also has (just?) one example for within… I suggest deleting the 
anorexia example so the within example comes first, and perhaps supplementing 
with

## Not run: 
# More advanced example of usage (not run)
model1  <- glm.nb(Cells ~ Cryogel*Day, data = myData)
myData2 <- within(myData, Cryogel <- relevel(Cryogel, ref = "2"))
model2  <- update(model1, data = myData1) 
## End(Not run)


Also, the help page for transform does not back link to within… That would be 
helpful. It also uses an example with attach, which might encourage less good 
habits.

Best wishes, and thanks,
tim
PS   : How does one best contribute to help files?
PPS: If R was to go to google summer of code, apart from helpful help, my 
wishlist would prioritise standardising all common functions to (where possible)
1. Have a formula interface
2. Support “data=“
3. Use a uniform na handling, with examples

Diverse NA behavior catches most of our students out (now all three departments 
of our school using R exclusively at the PG level)
NA behaviour can be
na.rm= as in rowMeans(na.rm=T))
use= as in an cor(use=“every”)   # with well documented effects in 
?cor(), and all legal options  shown 
(everything|all.obs|complete.obs|na.or.complete|pairwise.complete.obs)
na.action=  as in t.test(na.action=“na.omit”) # with no examples in ?t.test, 
nor a list of legal values

Likewise it would help reduce confusion when
plot(~Source, data=Df) # works
# but
boxplot(~Source, data=Df)
# Error in boxplot.formula(~Source, data = Df) : 
#  'formula' missing or incorrect

The formula isn’t missing or illformed, it’s that boxplot requires formulae to 
have something on the left hand side. 

best, t
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with optim()

2011-05-19 Thread Jonathan Daily

What do you mean when you say "wrong results"? What do you expect for
the output? Your code doesn't work for me because it references X in
places and X is not defined.

Have you tested your functions to make sure they return reasonable values?

On Thu, May 19, 2011 at 9:17 AM, chirine wolley
 wrote:
>
> Dear R-users,
>
> I would like to maximize the function g above which depends on 4 parameters 
> (2 vectors, 1 real number, and 1 matrix)  using optim() and BFGS method. Here 
> is my code:
>
> # fonction to maximize
>
> g=function(x)
> {
> x1 = x[1:ncol(X)]
> x2 = x[(ncol(X)+1)]
> x3 = 
> matrix(x[(ncol(X)+2):(ncol(X)+1+ncol(X)*ncol(Y))],nrow=ncol(X),ncol=ncol(Y))
> x4 = x[(ncol(X)+1+ncol(X)*ncol(Y)+1):length(x)]
> res1=rep(0,nrow(X))
> res2=matrix(0,nrow=nrow(X),ncol=ncol(Y))
> log.res2=matrix(0,nrow=nrow(X),ncol=ncol(Y))
> res2.b=rep(0,nrow(X))
> res3 = rep(0,nrow(X))
> res3.b = rep(0,nrow(X))
> for (i in 1:nrow(X))
> {
> res1[i]=1/(1+exp(-t(x1)%*%X[i,]-x2))
> for (t in 1:ncol(Y))
> {
> res2[i,t] = 
> ((1-(1+exp(-t(x3[,t])%*%X[i,]-x4[t]))^(-1))^(abs(Y[i,t]-Yb[i])))*(((1+exp(-t(x3[,t])%*%X[i,]-x4[t]))^(-1))^(1-abs(Y[i,t]-Yb[i])))
> log.res2[i,t]=log(res2[i,t])
> res2.b[i]=res2.b[i]+log.res2[i,t]
> }
> res3[i] = p_tilde[i]*log(res1[i])
> res3.b[i] = p_tilde[i]*(res2.b[i])
> }
> -(ncol(Y)*sum(res3)+sum(res3.b))
>
> }
>
> # Gradiants:
>
> gr=function(x)
> {
> x1 = x[1:ncol(X)]
> x2 = x[(ncol(X)+1)]
> x3 = 
> matrix(x[(ncol(X)+2):(ncol(X)+1+ncol(X)*ncol(Y))],nrow=ncol(X),ncol=ncol(Y))
> x4 = x[(ncol(X)+1+ncol(X)*ncol(Y)+1):length(x)]
> gr1 = rep(0,ncol(X))
> gr4 = rep(0,ncol(Y))
> gr3 = matrix(0,nrow=ncol(X),ncol=ncol(Y))
> gr1.b = matrix(0,nrow=nrow(X),ncol=ncol(X))
> gr2.b = rep(0,nrow(X))
> eta = matrix(0,nrow=nrow(X),ncol=ncol(Y))
> d.eta.3 = array(0,dim=c(nrow(X),ncol(X),ncol(Y)))
> d.eta.4 = matrix(0,nrow=nrow(X),ncol=ncol(Y))
> gr3.b1 = array(0,dim=c(nrow(X),ncol(X),ncol(Y)))
> gr4.b1 = matrix(0,nrow=nrow(X),ncol=ncol(Y))
>
> #Gradiant of alpha and beta
>
> for (i in 1:nrow(X))
> {
> gr1.b[i,] = 
> (2*p_tilde[i]-1)*((exp(-t(x1)%*%X[i,]-x2)*X[i,])/(1+exp(-t(x1)%*%X[i,]-x2))^2)
> gr2.b[i] = 
> (2*p_tilde[i]-1)*((exp(-t(x1)%*%X[i,]-x2))/(1+exp(-t(x1)%*%X[i,]-x2))^2)
> }
> for (j in 1:ncol(X))
> {
> gr1[j] = sum(gr1.b[,j])
> }
> gr2 = sum(gr2.b)
>
>
> #Gradiant de w et gamma
> for (i in 1:nrow(X))
> {
> for (t in 1:ncol(Y))
> {
> eta[i,t] = 1/(1+exp(-t(x3[,t])%*%X[i,]-x4[t]))
> d.eta.3[i,,t] = eta[i,t]*(1-eta[i,t])*X[i,]
> d.eta.4[i,t] = eta[i,t]*(1-eta[i,t])
> gr3.b1[i,,t] = 
> p_tilde[i]*((-abs(Y[i,t]-Yb[i]))*(1-eta[i,t])^(-1)+(1-abs(Y[i,t]-Yb[i]))*
> (eta[i,t])^(-1))*d.eta.3[i,,t]
> gr4.b1[i,t] = 
> p_tilde[i]*((-abs(Y[i,t]-Yb[i]))*(1-eta[i,t])^(-1)+(1-abs(Y[i,t]-Yb[i]))*
> (eta[i,t])^(-1))*d.eta.4[i,t]
> }
> }
> for (t in 1:ncol(Y))
> {
> for (j in 1:ncol(X))
> {
> gr3[j,t] = sum(gr3.b1[,j,t])
> }
> gr4[t] = sum(gr4.b1[,t])
> }
> c(-gr1,-gr2,-gr3,-gr4)
> }
>
> opt = optim(c(alpha[,c+1],beta[c+1],w,gamma),g,gr,method="BFGS")
>
> The problem is that it gives me wrong results, and I have noticed that if I 
> change my function to maximize (for example if, instead of 
> -(ncol(Y)*sum(res3)+sum(res3.b)), I try to maximise -(ncol(Y)*sum(res3)), it 
> gives me the exactly same results...which is not possible!
> So maybe I am using optim() in a wrong way...Does someone have an idea what 
> could be wrong in my code ?
>
> Thank you very much in advance
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
===
Jon Daily
Technician
===
#!/usr/bin/env outside
# It's great, trust me.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] problem with optim()

2011-05-19 Thread chirine wolley


Dear R-users,
 
I would like to maximize the function g above which depends on 4 parameters (2 
vectors, 1 real number, and 1 matrix)  using optim() and BFGS method. Here is 
my code:
 
# fonction to maximize
 
g=function(x)
{
x1 = x[1:ncol(X)]
x2 = x[(ncol(X)+1)]
x3 = 
matrix(x[(ncol(X)+2):(ncol(X)+1+ncol(X)*ncol(Y))],nrow=ncol(X),ncol=ncol(Y))
x4 = x[(ncol(X)+1+ncol(X)*ncol(Y)+1):length(x)]
res1=rep(0,nrow(X))
res2=matrix(0,nrow=nrow(X),ncol=ncol(Y))
log.res2=matrix(0,nrow=nrow(X),ncol=ncol(Y))
res2.b=rep(0,nrow(X))
res3 = rep(0,nrow(X))
res3.b = rep(0,nrow(X))
for (i in 1:nrow(X))
{
res1[i]=1/(1+exp(-t(x1)%*%X[i,]-x2))
for (t in 1:ncol(Y))
{
res2[i,t] = 
((1-(1+exp(-t(x3[,t])%*%X[i,]-x4[t]))^(-1))^(abs(Y[i,t]-Yb[i])))*(((1+exp(-t(x3[,t])%*%X[i,]-x4[t]))^(-1))^(1-abs(Y[i,t]-Yb[i])))
 
log.res2[i,t]=log(res2[i,t])
res2.b[i]=res2.b[i]+log.res2[i,t]
}
res3[i] = p_tilde[i]*log(res1[i])
res3.b[i] = p_tilde[i]*(res2.b[i])
}
-(ncol(Y)*sum(res3)+sum(res3.b))

}
$B!!(B
# Gradiants:
 
gr=function(x)
{
x1 = x[1:ncol(X)]
x2 = x[(ncol(X)+1)]
x3 = 
matrix(x[(ncol(X)+2):(ncol(X)+1+ncol(X)*ncol(Y))],nrow=ncol(X),ncol=ncol(Y))
x4 = x[(ncol(X)+1+ncol(X)*ncol(Y)+1):length(x)]
gr1 = rep(0,ncol(X))
gr4 = rep(0,ncol(Y))
gr3 = matrix(0,nrow=ncol(X),ncol=ncol(Y))
gr1.b = matrix(0,nrow=nrow(X),ncol=ncol(X))
gr2.b = rep(0,nrow(X))
eta = matrix(0,nrow=nrow(X),ncol=ncol(Y))
d.eta.3 = array(0,dim=c(nrow(X),ncol(X),ncol(Y)))
d.eta.4 = matrix(0,nrow=nrow(X),ncol=ncol(Y))
gr3.b1 = array(0,dim=c(nrow(X),ncol(X),ncol(Y)))
gr4.b1 = matrix(0,nrow=nrow(X),ncol=ncol(Y))
$B!!(B
#Gradiant of alpha and beta

for (i in 1:nrow(X))
{
gr1.b[i,] = 
(2*p_tilde[i]-1)*((exp(-t(x1)%*%X[i,]-x2)*X[i,])/(1+exp(-t(x1)%*%X[i,]-x2))^2)
gr2.b[i] = 
(2*p_tilde[i]-1)*((exp(-t(x1)%*%X[i,]-x2))/(1+exp(-t(x1)%*%X[i,]-x2))^2)
}
for (j in 1:ncol(X))
{
gr1[j] = sum(gr1.b[,j])
}
gr2 = sum(gr2.b) 
 

#Gradiant de w et gamma
for (i in 1:nrow(X))
{
for (t in 1:ncol(Y))
{
eta[i,t] = 1/(1+exp(-t(x3[,t])%*%X[i,]-x4[t]))
d.eta.3[i,,t] = eta[i,t]*(1-eta[i,t])*X[i,]
d.eta.4[i,t] = eta[i,t]*(1-eta[i,t])
gr3.b1[i,,t] = 
p_tilde[i]*((-abs(Y[i,t]-Yb[i]))*(1-eta[i,t])^(-1)+(1-abs(Y[i,t]-Yb[i]))*
(eta[i,t])^(-1))*d.eta.3[i,,t]
gr4.b1[i,t] = 
p_tilde[i]*((-abs(Y[i,t]-Yb[i]))*(1-eta[i,t])^(-1)+(1-abs(Y[i,t]-Yb[i]))*
(eta[i,t])^(-1))*d.eta.4[i,t]
} 
}
for (t in 1:ncol(Y))
{
for (j in 1:ncol(X))
{
gr3[j,t] = sum(gr3.b1[,j,t])
}
gr4[t] = sum(gr4.b1[,t])
}
c(-gr1,-gr2,-gr3,-gr4)
}
 
opt = optim(c(alpha[,c+1],beta[c+1],w,gamma),g,gr,method="BFGS")
 
The problem is that it gives me wrong results, and I have noticed that if I 
change my function to maximize (for example if, instead of 
-(ncol(Y)*sum(res3)+sum(res3.b)), I try to maximise -(ncol(Y)*sum(res3)), it 
gives me the exactly same results...which is not possible!
So maybe I am using optim() in a wrong way...Does someone have an idea what 
could be wrong in my code ?
 
Thank you very much in advance
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Specifying Splits WhenUusing rpart

2011-05-19 Thread Bruce Johnson

I am using the package rpart to explore various classification structures.  

 

The call looks like: 

 

seekhi1<-rpart(pvol~spec+a1+psize+eppres+numpt+icds+bivalcrt+stents+ppshare+
nhosp+nyrs,data=dat,method="class",

control=rpart.control(minsplit=30,xval=10)) 

 

The output is

1) root 198 87 1 (0.5606061 0.4393939)  

   2) psize=1,2 122 43 1 (0.6475410 0.3524590)  

 4) a1=3 23  5 1 (0.7826087 0.2173913) *

 5) a1=1,2 99 38 1 (0.6161616 0.3838384)  

  10) numpt=1 53 16 1 (0.6981132 0.3018868) *

  11) numpt=2 46 22 1 (0.5217391 0.4782609)  

22) nhosp=1 18  5 1 (0.722 0.278) *

23) nhosp=2 28 11 2 (0.3928571 0.6071429) *

   3) psize=3 76 32 2 (0.4210526 0.5789474)  

 6) spec=3 16  5 1 (0.6875000 0.3125000) *

 7) spec=1,2 60 21 2 (0.350 0.650)  

  14) bivalcrt=2 34 16 2 (0.4705882 0.5294118)  

28) stents=2 17  6 1 (0.6470588 0.3529412) *

29) stents=1 17  5 2 (0.2941176 0.7058824) *

  15) bivalcrt=1 26  5 2 (0.1923077 0.8076923) *

 

I'd like to force rpart to use a specific variable, namely "spec" as the
first split and then allow it to proceed as usual.  Is there a way to do
this?

 

Thanks,

Bruce

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with unsplit()

2011-05-19 Thread Scott Chamberlain

You generally aren't allowed to have duplicate row names in a data frame in R. 
If you want to keep your rownames you can make them a column in the data frame 
before unsplitting. 

Scott
On Thursday, May 19, 2011 at 5:27 AM, Cecilia Carmo wrote: 
> Hi everyone,
> 
> 
> 
> I have already used split() and unsplit() in data frames without problems,
> but now IÂm applying these functions to other data and when using unsplit()
> I have received the following message:
> 
> 
> 
> Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", "2", "3", "4", : 
> 
>  duplicate 'row.names' are not allowed
> 
> In addition: Warning message:
> 
> non-unique values when setting 'row.names': ÂÂ
> 
> 
> 
> Could anyone explain me what does this mean?
> 
> 
> 
> Thank you in advance,
> 
> 
> 
> CecÃlia Carmo
> 
> (Universidade de Aveiro - Portugal)
> 
> 
>  [[alternative HTML version deleted]]
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] balanced panel data

2011-05-19 Thread Scott Chamberlain

# If you know how many years are needed you could do this
makenewtable <- function(x, years) {
xlist <- split(x, x$firm)
new <- list()
dat <- lapply(xlist, function(z) if(length(unique(z$year)) == years) {new <- z} 
)
dat_ <- do.call(rbind, dat)
return(dat_)
}
makenewtable(finaldata, 5)


Scott 
On Thursday, May 19, 2011 at 6:24 AM, Cecilia Carmo wrote:
I have a dataframe with many firm-year observations and many variables. 
> 
> Not all firms have information for all the years.
> 
> I want another dataframe with only those firms that have information all
> years.
> 
> This is, I want a balanced panel data, but with the maximum number of years.
> 
> In my reprocucible example I want to keep firms 1,2 and 3 (period 2000 to
> 2004). 
> 
> 
> 
> I need your help to create a code for this.
> 
> 
> 
> Thank you very much,
> 
> 
> 
> CecÃlia Carmo
> 
> (Universidade de Aveiro)
> 
> 
> 
> 
> 
> #My reproducible example:
> 
> firm<-sort(rep(1:3,5),decreasing=F)
> 
> year<-rep(2000:2004,3)
> 
> X<-rnorm(15)
> 
> data1<-data.frame(firm,year,X)
> 
> data1
> 
> 
> 
> firm<-sort(rep(4:6,3),decreasing=F)
> 
> year<-rep(2001:2003,3)
> 
> X<-rnorm(9)
> 
> data2<-data.frame(firm,year,X)
> 
> data2
> 
> 
> 
> finaldata<-rbind(data1,data2)
> 
> finaldata
> 
> 
>  [[alternative HTML version deleted]]
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] identical function names from 2 packages