[R] [] and escaping in regular expressions

2013-09-24 Thread Juliet Hannah
Is it correct that one does not need to escape special characters such as
"*" (are these
properly called metacharacters) inside []. If so, what is the logic to this?

mytest <- "he*llo"
sub("[*]","",mytest)
sub("\\*","",mytest)

[] is easier to read for me than \\. Is this what people tend to use?

Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mgcv: how select significant predictor vars when using gam(...select=TRUE) using automatic optimization

2013-04-24 Thread Juliet Hannah
Hi Jan and Simon,

If possible, could you attach the diagnostic plots. I would be curious to
see them.

Thanks,

Juliet


On Fri, Apr 19, 2013 at 4:39 AM, jholstei  wrote:

> Simon,
>
> that was very instructive—very special thanks to you.
> I already noticed that the model was bad, but it was not clear to me that
> transformation of predictors to, say a more centered distribution is
> helpful here.
> And thanks for pointing out Tweedie, I noticed that the error structure is
> far from normal and more like gamma or poisson, but Gamma made things worse.
>
> Best regards,
> Jan
>
>
>
>
>
> Am 18 Apr 2013 um 17:25 schrieb Simon Wood:
>
> > Jan,
> >
> > Thanks for the data (off list). The p-value computations are based on
> the approximation that things are approximately normal on the linear
> predictor scale, but actually they are no where close to normal in this
> case, which is why the p-values look inconsistent. The reason that the
> approximate normality assumption doesn't hold is that the model is quite a
> poor fit. If you take a look at gam.check(fit) you'll see that the constant
> variance assumption of quasi(link=log) is violated quite badly, and the
> residual distribution is really quite odd (plot residuals against fitted as
> well). Also see plot(fit,pages=1,scale=0) - it shows ballooning confidence
> intervals and smooth estimates that are so low in places that they might as
> well be minus infinity (given log link) - clearly something is wrong with
> this model!
> >
> > I would be inclined to reset all the 0's to 0 (rather than 0.01), and
> then to try Tweedie(p=1.5,link=log) as the family. Also the predictor
> variables are very skewed which is giving leverage problems, so I would
> transform them to give less skew. e.g. Something like
> >
> > fit<-gam(target~s(log(mgs))+s(I(gsd^.5))+s(I(mud^.25))+s(log(ssCmax)),
> > family=Tweedie(p=1.6,link=log),data=df,method="REML")
> >
> > gives a model that is closer to being reasonable (p-values are then
> consistent between select=TRUE and FALSE).
> >
> > best,
> > Simon
> >
> > On 18/04/13 14:24, Simon Wood wrote:
> >> Jan,
> >>
> >> Thanks for this. Is there any chance that you could send me the data off
> >> list and I'll try to figure out what is happening? (Under the
> >> understanding that I'll only use the data for investigating this issue,
> >> of course).
> >>
> >> best,
> >> Simon
> >>
> >> on 18/04/13 11:11, Jan Holstein wrote:
> >>> Simon,
> >>>
> >>> thanks for the reply,  I guess I'm pretty much up to date using
> >>>  mgcv 1.7-22.
> >>> Upgrading to R 3.0.0 also didn't do any change.
> >>>
> >>> Unfortunately using method="REML" does not make any difference:
> >>>
> >>> ### first with "select=FALSE"
>  fit<-gam(target
> 
> ~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,method="REML",select=F)
> 
>  summary(fit)
> >>>
> >>> Family: quasi
> >>> Link function: log
> >>> Formula:
> >>> target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
> >>> Parametric coefficients:
> >>> Estimate Std. Error t value Pr(>|t|)
> >>> (Intercept)   -4.724  7.462  -0.6330.527
> >>> Approximate significance of smooth terms:
> >>> edf Ref.df  F p-value
> >>> s(mgs)3.118  3.492  0.099   0.974
> >>> s(gsd)6.377  7.044 15.596  <2e-16 ***
> >>> s(mud)8.837  8.971 18.832  <2e-16 ***
> >>> s(ssCmax) 3.886  4.051  2.342   0.052 .
> >>> ---
> >>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>> R-sq.(adj) =  0.403   Deviance explained = 40.6%
> >>> REML score =  33186  Scale est. = 8.7812e+05  n = 4511
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>  Then using "select=T"
> >>>
>  fit2<-gam(target
> 
> ~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,method="REML",select=TRUE)
> 
>  summary(fit2)
> >>> Family: quasi
> >>> Link function: log
> >>> Formula:
> >>> target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
> >>> Parametric coefficients:
> >>> Estimate Std. Error t value Pr(>|t|)
> >>> (Intercept)   -6.406  5.239  -1.2230.222
> >>> Approximate significance of smooth terms:
> >>> edf Ref.df F p-value
> >>> s(mgs)2.844  8 25.43  <2e-16 ***
> >>> s(gsd)6.071  9 14.50  <2e-16 ***
> >>> s(mud)6.875  8 21.79  <2e-16 ***
> >>> s(ssCmax) 3.787  8 18.42  <2e-16 ***
> >>> ---
> >>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >>> R-sq.(adj) =0.4   Deviance explained = 40.1%
> >>> REML score =  33203  Scale est. = 8.8359e+05  n = 4511
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> I played around with other families/link functions with no success
> >>> regarding
> >>> the "select" behaviour.
> >>>
> >>> Well, look at the structure of my data:
> >>> 
> >>>
> >>> All possible predictor variables in principle look like this, and taken
> >>> alone, each and every is significant according to p-value (but not all
>

Re: [R] Speeding reading of a large file

2012-12-06 Thread Juliet Hannah
Thanks, it does help. Is it possible to elaborate on how specifically
why this syntax
preserves dimensions. It this correct to just say that even though
lapply returns a list, x[] forces x to have the
same dimensions?

On Thu, Dec 6, 2012 at 11:53 AM, Rui Barradas  wrote:
> Hello,
>
> Because x[] keeps the dimensions, unlike just x.
>
> Hope this helps,
>
> Rui Barradas
> Em 06-12-2012 16:24, Juliet Hannah escreveu:
>
>> All,
>>
>> Can someone describe what
>>
>>   x[] <- lapply(x, as.numeric)
>>
>> I see that it is putting the list elements into a data frame. The
>> results for lapply are a list, so how does this become
>> a data frame.
>>
>> Thanks,
>>
>> Juliet
>>
>>
>> On Mon, Dec 3, 2012 at 5:49 PM, Fisher Dennis 
>> wrote:
>>>
>>> Colleagues,
>>>
>>> This past week, I asked the following question:
>>>
>>>  I have a file that looks that this:
>>>
>>>  TABLE NO.  1
>>>   PTIDTIMEAMT FORMPERIOD
>>> IPRED   CWRES   EVIDCP  PREDRES WRES
>>>2.0010E+03  3.9375E-01  5.E+03  2.E+00  0.E+00
>>> 0.E+00  0.E+00  1.E+00  0.E+00  0.E+00 0.E+00
>>> 0.E+00
>>>2.0010E+03  8.9583E-01  5.E+03  2.E+00  0.E+00
>>> 3.3389E+00  0.E+00  1.E+00  0.E+00  3.5321E+00 0.E+00
>>> 0.E+00
>>>2.0010E+03  1.4583E+00  5.E+03  2.E+00  0.E+00
>>> 5.8164E+00  0.E+00  1.E+00  0.E+00  5.9300E+00 0.E+00
>>> 0.E+00
>>>2.0010E+03  1.9167E+00  5.E+03  2.E+00  0.E+00
>>> 8.3633E+00  0.E+00  1.E+00  0.E+00  8.7011E+00 0.E+00
>>> 0.E+00
>>>2.0010E+03  2.4167E+00  5.E+03  2.E+00  0.E+00
>>> 1.0092E+01  0.E+00  1.E+00  0.E+00  1.0324E+01 0.E+00
>>> 0.E+00
>>>2.0010E+03  2.9375E+00  5.E+03  2.E+00  0.E+00
>>> 1.1490E+01  0.E+00  1.E+00  0.E+00  1.1688E+01 0.E+00
>>> 0.E+00
>>>2.0010E+03  3.4167E+00  5.E+03  2.E+00  0.E+00
>>> 1.2940E+01  0.E+00  1.E+00  0.E+00  1.3236E+01 0.E+00
>>> 0.E+00
>>>2.0010E+03  4.4583E+00  5.E+03  2.E+00  0.E+00
>>> 1.1267E+01  0.E+00  1.E+00  0.E+00  1.1324E+01 0.E+00
>>> 0.E+00
>>>
>>>  The file is reasonably large (> 10^6 lines) and the two line
>>> header is repeated periodically in the file.
>>>  I need to read this file in as a data frame.  Note that the
>>> number of columns, the column headers, and the number of replicates of the
>>> headers are not known in advance.
>>>
>>> I received a number of replies, many of them quite useful.  Of these, one
>>> beat out all the others in my benchmarking using files ranging from 10^5 to
>>> 10^6 lines.
>>> That version, provided by Jim Holtman, was:
>>>  x   <- read.table(FILE, as.is = TRUE, skip=1,
>>> fill=TRUE, header = TRUE)
>>>  x[] <- lapply(x, as.numeric)
>>>  x   <- x[!is.na(x[,1]), ]
>>>
>>> Other versions involved readLines, following by edits, following by cat
>>> (or write) to a temp file, then read.table again.
>>> The overhead with invoking readLines, write/cat, and read.table was
>>> substantially larger than the strategy of read.table / as.numeric / indexing
>>>
>>> Thanks for the input from many folks.
>>>
>>> Dennis
>>>
>>> Dennis Fisher MD
>>> P < (The "P Less Than" Company)
>>> Phone: 1-866-PLessThan (1-866-753-7784)
>>> Fax: 1-866-PLessThan (1-866-753-7784)
>>> www.PLessThan.com
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speeding reading of a large file

2012-12-06 Thread Juliet Hannah
All,

Can someone describe what

 x[] <- lapply(x, as.numeric)

I see that it is putting the list elements into a data frame. The
results for lapply are a list, so how does this become
a data frame.

Thanks,

Juliet


On Mon, Dec 3, 2012 at 5:49 PM, Fisher Dennis  wrote:
> Colleagues,
>
> This past week, I asked the following question:
>
> I have a file that looks that this:
>
> TABLE NO.  1
>  PTIDTIMEAMT FORMPERIOD  IPRED
>CWRES   EVIDCP  PREDRES WRES
>   2.0010E+03  3.9375E-01  5.E+03  2.E+00  0.E+00  
> 0.E+00  0.E+00  1.E+00  0.E+00  0.E+00 0.E+00  
> 0.E+00
>   2.0010E+03  8.9583E-01  5.E+03  2.E+00  0.E+00  
> 3.3389E+00  0.E+00  1.E+00  0.E+00  3.5321E+00 0.E+00  
> 0.E+00
>   2.0010E+03  1.4583E+00  5.E+03  2.E+00  0.E+00  
> 5.8164E+00  0.E+00  1.E+00  0.E+00  5.9300E+00 0.E+00  
> 0.E+00
>   2.0010E+03  1.9167E+00  5.E+03  2.E+00  0.E+00  
> 8.3633E+00  0.E+00  1.E+00  0.E+00  8.7011E+00 0.E+00  
> 0.E+00
>   2.0010E+03  2.4167E+00  5.E+03  2.E+00  0.E+00  
> 1.0092E+01  0.E+00  1.E+00  0.E+00  1.0324E+01 0.E+00  
> 0.E+00
>   2.0010E+03  2.9375E+00  5.E+03  2.E+00  0.E+00  
> 1.1490E+01  0.E+00  1.E+00  0.E+00  1.1688E+01 0.E+00  
> 0.E+00
>   2.0010E+03  3.4167E+00  5.E+03  2.E+00  0.E+00  
> 1.2940E+01  0.E+00  1.E+00  0.E+00  1.3236E+01 0.E+00  
> 0.E+00
>   2.0010E+03  4.4583E+00  5.E+03  2.E+00  0.E+00  
> 1.1267E+01  0.E+00  1.E+00  0.E+00  1.1324E+01 0.E+00  
> 0.E+00
>
> The file is reasonably large (> 10^6 lines) and the two line header 
> is repeated periodically in the file.
> I need to read this file in as a data frame.  Note that the number of 
> columns, the column headers, and the number of replicates of the headers are 
> not known in advance.
>
> I received a number of replies, many of them quite useful.  Of these, one 
> beat out all the others in my benchmarking using files ranging from 10^5 to 
> 10^6 lines.
> That version, provided by Jim Holtman, was:
> x   <- read.table(FILE, as.is = TRUE, skip=1, fill=TRUE, 
> header = TRUE)
> x[] <- lapply(x, as.numeric)
> x   <- x[!is.na(x[,1]), ]
>
> Other versions involved readLines, following by edits, following by cat (or 
> write) to a temp file, then read.table again.
> The overhead with invoking readLines, write/cat, and read.table was 
> substantially larger than the strategy of read.table / as.numeric / indexing
>
> Thanks for the input from many folks.
>
> Dennis
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] testing parallel slopes assumption for Ordinal Logistic Regression

2012-05-04 Thread Juliet Hannah
See the post by Frank Harrell at:

http://groups.google.com/group/medstats/browse_thread/thread/cbff7871179e9508?pli=1

or google

regrouping to satisfy proportional odds

On Tue, May 1, 2012 at 2:14 AM, 80past2  wrote:
> Hi everyone, I'm a bit new here (and new to R), and I was trying to do an
> OLR, and testing the parallel slope assumption seems be very important. I
> browsed through past postings, and didn't find much to help me in this area.
> I was wondering if anyone knew how I could go about doing this. Thank you.
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/testing-parallel-slopes-assumption-for-Ordinal-Logistic-Regression-tp4600234.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] resampling syntax for caret package

2012-04-06 Thread Juliet Hannah
Max and List,

Could you advise me if I am using the proper caret syntax to carry out
leave-one-out cross validation. In the example below, I use example
data from the rda package. I use caret to tune over a grid and select
an optimal value. I think I am then using the optimal selection for
prediction.  So there are two rounds of resampling with the first one
taken care of by caret's train function.

My question overall is that it seems I must carry the outer resampling
plan manually.

On another note, I usually get the warning

1: In train.default(colon.x[-holdout, ], outcome[-holdout], method = "pam",  :
  At least one of the class levels are not valid R variables names;
This may cause errors if class probabilities are generated because the
variables names will be converted to: X1, X2
2: executing %dopar% sequentially: no parallel backend registered

When I change the variable names, caret gives me predictions as a
numeric value corresponding to the ordered level. Have I missed
something here?


Thanks,

Juliet

# start example

library(caret)
# to obtain data
library(rda)

data(colon)

#  add colnames
myind <- seq(1:ncol(colon.x))
mynames <- paste("A",myind,sep="")
colnames(colon.x) <- mynames

outcome  <- factor(as.character(colon.y),levels=c("1","2"))

cv_index <- 1:length(outcome)
predictions <- rep(-1,length(cv_index))

pamGrid <- seq(0.1,5,by=0.2)
pamGrid <- data.frame(.threshold=pamGrid)

# manual leave-one-out
for (holdout in cv_index) {
pamFit1 <- train(colon.x[-holdout,], outcome[-holdout],
 method = "pam",
 tuneGrid= pamGrid,
 trControl = trainControl(method = "cv"))

predictions[holdout] = predict(pamFit1,newdata =
colon.x[holdout,,drop=FALSE])

}



# end example


> sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
 [1] pamr_1.54survival_2.36-12 e1071_1.6class_7.3-3
 [5] rda_1.0.2caret_5.15-023   foreach_1.3.5codetools_0.2-8
 [9] iterators_1.0.5  cluster_1.14.2   reshape_0.8.4plyr_1.7.1
[13] lattice_0.20-6

loaded via a namespace (and not attached):
[1] compiler_2.14.2 grid_2.14.2 tools_2.14.2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glmnet: obtain predictions using predict and also by extracting coefficients

2012-03-21 Thread Juliet Hannah
Oops. Coefficients are returned on the scale of the original data.

testX <- cbind(1,data.test)
yhat2  <- testX %*% beta

# works

plot(yhat2,yhat_enet)


On Wed, Mar 21, 2012 at 2:35 PM, Juliet Hannah  wrote:
> All,
>
> For my understanding, I wanted to see if I can get glmnet predictions
> using both the predict function and also by multiplying coefficients
> by the variable matrix. This is not worked out. Could anyone suggest
> where I am going wrong?
> I understand that I may not have the mean/intercept correct, but the
> scaling is also off, which suggests a bigger mistake.
>
>  Thanks for your help.
>
> Juliet Hannah
>
>
> library(ElemStatLearn)
> library(glmnet)
>
> data(prostate)
>
> # training data
> data.train <- prostate[prostate$train,]
> y <- data.train$lpsa
>
> # isolate predictors
> data.train <- as.matrix(data.train[,-c(9,10)])
>
> # test data
> data.test <- prostate[!prostate$train,]
> data.test <-  as.matrix(data.test[,-c(9,10)])
>
> # scale test data  by using means and sd from training data
>
> trainMeans <- apply(data.train,2,mean)
> trainSDs <- apply(data.train,2,sd)
>
> # create standardized test data
>
> data.test.std <- sweep(data.test, 2, trainMeans)
> data.test.std <- sweep(data.test.std, 2, trainSDs, "/")
>
> # fit training model
>
> myglmnet =cv.glmnet(data.train,y)
>
> # predictions by using predict function
>
> yhat_enet <- predict(myglmnet,newx=data.test, s="lambda.min")
>
> # attempting to get predictions by using coefficients
>
> beta  <- as.vector( t(coef(myglmnet,s="lambda.min")))
>
> testX <- cbind(1,data.test.std)
>
> yhat2  <- testX %*% beta
>
> # does not match
>
> plot(yhat2,yhat_enet)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmnet: obtain predictions using predict and also by extracting coefficients

2012-03-21 Thread Juliet Hannah
All,

For my understanding, I wanted to see if I can get glmnet predictions
using both the predict function and also by multiplying coefficients
by the variable matrix. This is not worked out. Could anyone suggest
where I am going wrong?
I understand that I may not have the mean/intercept correct, but the
scaling is also off, which suggests a bigger mistake.

 Thanks for your help.

Juliet Hannah


library(ElemStatLearn)
library(glmnet)

data(prostate)

# training data
data.train <- prostate[prostate$train,]
y <- data.train$lpsa

# isolate predictors
data.train <- as.matrix(data.train[,-c(9,10)])

# test data
data.test <- prostate[!prostate$train,]
data.test <-  as.matrix(data.test[,-c(9,10)])

# scale test data  by using means and sd from training data

trainMeans <- apply(data.train,2,mean)
trainSDs <- apply(data.train,2,sd)

# create standardized test data

data.test.std <- sweep(data.test, 2, trainMeans)
data.test.std <- sweep(data.test.std, 2, trainSDs, "/")

# fit training model

myglmnet =cv.glmnet(data.train,y)

# predictions by using predict function

yhat_enet <- predict(myglmnet,newx=data.test, s="lambda.min")

# attempting to get predictions by using coefficients

beta  <- as.vector( t(coef(myglmnet,s="lambda.min")))

testX <- cbind(1,data.test.std)

yhat2  <- testX %*% beta

# does not match

plot(yhat2,yhat_enet)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Normalization in R

2012-01-24 Thread Juliet Hannah
For quantile normalization check out

normalize.quantiles in the Biocondcutor preProcess package. Also,
there is a Bioconductor mailing list for future where these topics
are discussed.

http://svitsrv25.epfl.ch/R-doc/library/preprocessCore/html/normalize.quantiles.html

On Sat, Jan 21, 2012 at 7:32 AM, elisacarli21  wrote:
> Dear all
>
> I need to apply a normal, mean and quantile normalization to a data matrix.
> Could you give me any suggestions?
>
> Bests
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Normalization-in-R-tp4315911p4315911.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bioconductor. MA plot for qPCR array

2011-12-15 Thread Juliet Hannah
You may find the following discussion helpful.

http://comments.gmane.org/gmane.science.biology.informatics.conductor/37388

On Sun, Dec 11, 2011 at 8:08 AM, ali_protocol
 wrote:
> Dear all,
>
> Is there anyway too generate MA plot for 2 qPCR assays (an array of 2x 400).
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Bioconductor-MA-plot-for-qPCR-array-tp4182805p4182805.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregate syntax for grouped column means

2011-11-29 Thread Juliet Hannah
I am calculating the mean of each column grouped by the variable 'id'.
I do this using aggregate, data.table, and plyr. My aggregate results
do not match the other two, and I am trying to figure out what is
incorrect with my syntax. Any suggestions? Thanks.

Here is the data.

myData <- structure(list(var1 = c(31.59, 32.21, 31.78, 31.34, 31.61, 31.61,
30.59, 30.84, 30.98, 30.79, 30.79, 30.94, 31.08, 31.27, 31.11,
30.42, 30.37, 30.29, 30.06, 30.3, 30.43, 30.61, 30.64, 30.75,
30.39, 30.1, 30.25, 31.55, 31.96, 31.87, 30.29, 30.15, 30.37,
29.59, 29.52, 28.96, 29.69, 29.58, 29.52, 30.21, 30.3, 30.25,
30.23, 30.29, 30.39), var2 = c(33.78, 33.25, NA, 32.05, 32.59,
NA, 32.24, NA, NA, 32.15, 32.39, NA, 32.4, 31.6, NA, 30.5, 30.66,
NA, 30.6, 29.95, NA, 31.24, 30.73, NA, 30.51, 30.43, 31.17, 31.44,
31.17, 31.18, 31.01, 30.98, 31.25, 30.44, 30.47, NA, 30.47, 30.56,
NA, 30.6, 30.57, NA, 31, 30.8, NA), id = c("0m4", "0m4", "0m4",
"0m5", "0m5", "0m5", "0m6", "0m6", "0m6", "0m11", "0m11", "0m11",
"0m12", "0m12", "0m12", "205m1", "205m1", "205m1", "205m4", "205m4",
"205m4", "205m5", "205m5", "205m5", "205m6", "205m6", "205m6",
"205m7", "205m7", "205m7", "600m1", "600m1", "600m1", "600m3",
"600m3", "600m3", "600m4", "600m4", "600m4", "600m5", "600m5",
"600m5", "600m7", "600m7", "600m7")), .Names = c("var1", "var2",
"id"), row.names = c(NA, -45L), class = "data.frame")

> head(myData)
   var1  var2  id
1 31.59 33.78 0m4
2 32.21 33.25 0m4
3 31.78NA 0m4
4 31.34 32.05 0m5
5 31.61 32.59 0m5
6 31.61NA 0m5



results1 <- aggregate(. ~  id ,data=myData,FUN=mean,na.rm=T)
 head(results1,1)
#id  var1  var2
# 1 0m11 30.79 32.27

library(data.table)
mydt <- data.table(myData)
setkey(mydt,id)
results2 <- mydt[,lapply(.SD,mean,na.rm=TRUE),by=id]
 head(results2,1)
#   id  var1  var2
# [1,] 0m11 30.84 32.27

library(plyr)
results3 <- ddply(myData,.(id),colwise(mean),na.rm=TRUE)
 head(results3,1)
#id  var1  var2
# 1 0m11 30.84 32.27

> sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] plyr_1.6 data.table_1.7.3

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading a specific column of a csv file in a loop

2011-11-15 Thread Juliet Hannah
In the solution below, what is the advantage of using "0L".

 M0 <- read.csv("M1.csv", nrows = 1)[0L, ]

Thanks!

2011/11/8 Gabor Grothendieck :
> 2011/11/8 Sergio René Araujo Enciso :
>> Dear all:
>>
>> I have two larges files with 2000 columns. For each file I am
>> performing a loop to extract the "i"th element of each file and create
>> a data frame with both "i"th elements in order to perform further
>> analysis. I am not extracting all the "i"th elements but only certain
>> which I am indicating on a vector called "d".
>>
>> See  an example of my  code below
>>
>> ### generate an example for the CSV files, the original files contain
>> more than 2000 columns, here for the sake of simplicity they have only
>> 10 columns
>> M1<-matrix(rnorm(1000), nrow=100, ncol=10,
>> dimnames=list(seq(1:100),letters[1:10]))
>> M2<-matrix(rnorm(1000), nrow=100, ncol=10,
>> dimnames=list(seq(1:100),letters[1:10]))
>> write.table(M1, file="M1.csv", sep=",")
>> write.table(M2, file="M2.csv", sep=",")
>>
>> ### the vector containing the "i" elements to be read
>> d<-c(1,4,7,8)
>> P1<-read.table("M1.csv", header=TRUE)
>> P2<-read.table("M1.csv", header=TRUE)
>> for (i in d) {
>> M<-data.frame(P1[i],P2[i])
>> rm(list=setdiff(ls(),"d"))
>> }
>>
>> As the files are quite large, I want to include "read.table" within
>> the loop so as it only read the "i"th element. I know that there is
>> the option "colClasses" for which I have to create a vector with zeros
>> for all the columns I do not want to load. Nonetheless I have no idea
>> how to make this vector to change in the loop, so as the only element
>> with no zeros is the "i"th element following the vector "d". Any ideas
>> how to do this? Or is there anz other approach to load only an
>> specific element?
>>
>
> Its a bit messy if there are row names so lets generate M1.csv like this:
>
> write.csv(M1, file = "M1.csv", row.names = FALSE)
>
> Then we can do this:
>
> nc <- ncol(read.csv("M1.csv", nrows = 1))
> colClasses <- replace(rep("NULL", nc), d, NA)
> M1.subset <- read.csv("M1.csv", colClasses = colClasses)
>
> or using the same M1.csv that we just generated try this which uses
> sqldf with the H2 backend:
>
> library(sqldf)
> library(RH2)
>
> M0 <- read.csv("M1.csv", nrows = 1)[0L, ]
> M1.subset.h2 <- sqldf(c("insert into M0 (select * from csvread('M1.csv'))",
>        "select a, d, g, h from M0"))
>
> This is referred to as Alternative 3 in FAQ#10 Example 6a on the sqldf
> home page:
> http://sqldf.googlecode.com
> Alternative 1 and Alternative 2 listed there could also be tried.
>
> (Note that although sqldf has a read.csv.sql command we did not use it
> here since that command only works with the sqlite back end and the
> RSQLite driver has a max of 999 columns.)
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] heritability estimation

2011-10-17 Thread Juliet Hannah
Search: "mcmcglmm heritability" to see some discussions using the
mcmcglmm package. This package is discussed often
on the mixed model list. You can also use the kinship package. It will
take some time to get familiar with R. Work through
a few of the examples for variance component models, and then I would
suggest posting specific questions to the mixed model list.

On Fri, Oct 14, 2011 at 9:49 AM, Moohbear  wrote:
> Hello,
>
> I'm looking for a method to estimate narrow sense heritability of traits in
> a RIL population. Papers I've checked either use either SAS or SPSS or do
> not give any details at all. I've found some reference to using variance
> components in ANOVA, using the kinship or wgaim packages, but I don't have a
> clue as to how to do any of this.
> Is there any way fro a very R illiterate user to do it?
>
> Thanks
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/heritability-estimation-tp3904908p3904908.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] expression set (Bioconductor) problem

2011-10-09 Thread Juliet Hannah
Note that exprs returns a matrix, so we can manipulate that just as we
would for any other type of matrix. There is also a Bioconductor
mailing list, which may be helpful.

On Thu, Oct 6, 2011 at 4:56 AM, Clayton K Collings  wrote:
> Hello R people,
>
>>dim(exprs(estrogenrma)
>
> I have an expressionSet with 8 samples and 12695 features (genes)
>
>
>> estrogenrma$estrogen
>  present present absent absent present present absent absent
>> estrogenrma$time.h
>  10 10 10 10 48 48 48 48
>
> present <- grep("present", as.character(estrogenrma$estrogen))
> absent  <- grep("absent", as.character(estrogenrma$estrogen))
> ten <- grep("10", as.character(estrogenrma$time.h))
> fortyeight  <- grep("48", as.character(estrogenrma$time.h))
>
> present.10 <- estrogenrma[, intersect(present, ten)]
> present.48 <- estrogenrma[, intersect(present, fortyeight)]
> absent.10 <- estrogenrma[, intersect(absent, ten)]
> absent.48 <- estrogenrma[, intersect(absent, fortyeight)]
>
>
> present.10, present.48, absent.10, and absent.48 are four expression sets
> with two samples and 12695 features.
>
> How can I make a new 2 new expressionsets, each have 12695 features and one 
> sample
> where
> expressionset1 = (present.10 + present.48) / 2
> expressionset2 = (absent.10 + absent.48) / 2
> ?
>
> Thanks,
> Clayton
>
> - Original Message -
> From: "Tal Galili" 
> To: "SML" 
> Cc: r-help@r-project.org
> Sent: Thursday, October 6, 2011 4:09:43 AM
> Subject: Re: [R] Mean(s) from values in different row?
>
> One way for doing it would be to combine the columns using paste and then
> use tapply to get the means.
>
> For example:
>
> set.seed(32341)
> a1 = sample(c("a","b"), 100,replace = T)
> a2 = sample(c("a","b"), 100,replace = T)
> y = rnorm(100)
> tapply(y,paste(a1,a2), mean)
>
>
>
> Contact
> Details:---
> Contact me: tal.gal...@gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> --
>
>
>
>
> On Thu, Oct 6, 2011 at 8:40 AM, SML  wrote:
>
>> Hello:
>>
>> Is there a way to get a mean from values stored in different rows?
>>
>> The data looks like this:
>>  YEAR-1, JAN, FEB, ..., DEC
>>  YEAR-2, JAN, FEB, ..., DEC
>>  YEAR-3, JAN, FEB, ..., DEC
>>
>> What I want is the mean(s) for just the consecutive winter months:
>>  YEAR-1.DEC, YEAR-2.JAN, YEAR-2.FEB
>>  YEAR-2.DEC, YEAR-3.JAN, YEAR-3.FEB
>>  etc.
>>
>> Thanks.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Printing an xtable with type = html

2011-10-01 Thread Juliet Hannah
Maybe some of the comments in this post may be informative to you:

http://r.789695.n4.nabble.com/improve-formatting-of-HTML-table-td3736299.html

On Wed, Sep 28, 2011 at 6:21 AM, David Scott  wrote:
>
> I have been playing around with producing tables using xtable and the type =
> "html" argument when printing. For example, if xtbl is the output of a
> dataframe which has been run through xtable, using the command:
>
> print(xtbl, type = "html",
>      html.table.attributes = "border = '1', align = 'center'")
>
> I would be interested to see other examples of the use of xtable to produce
> html. There is a whole vignette on using xtable to produce all sorts of
> tables for incorporation into a TeX document but I have found no examples of
> producing html with any table attributes.
>
> Ideally xtable should be able to access a css file but I don't see any
> mechanism for doing that. Perhaps someone can enlighten me.
>
> David Scott
>
> --
> _
> David Scott     Department of Statistics
>                The University of Auckland, PB 92019
>                Auckland 1142,    NEW ZEALAND
> Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
> Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf syntax, selecting rows, and skipping

2011-09-29 Thread Juliet Hannah
I am using the example in this post:

https://stat.ethz.ch/pipermail/r-help/2010-October/257204.html

# create a file
write.table(iris,"iris.csv",row.names=FALSE,sep=",",quote=FALSE)


# this does not work
# has the syntax changed or  is there a mistake in my usage?
# the line from the post above is:
#  read.csv.sql("myfile.csv, sql = "select * from file 2000, 1000")

library(sqldf)
read.csv.sql("iris.csv", sql = "select * from file 5, 5")

# this works
# but i would like to keep the header

 read.csv.sql("iris.csv", sql = "select * from file limit
5",skip=5,header=FALSE)

# thanks


> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets
methods   base

other attached packages:
[1] sqldf_0.4-2   chron_2.3-42  gsubfn_0.5-7
proto_0.3-9.2 RSQLite.extfuns_0.0.1 RSQLite_0.9-4
DBI_0.2-5 myfunctions_1.0

loaded via a namespace (and not attached):
[1] tools_2.13.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error building package: packaging into .tar.gz failed

2011-09-07 Thread Juliet Hannah
To follow up (because I received a few emails off-list), things work now.

I'm not sure what I did differently. In case it is helpful:

I reinstalled R tools, and edited by path so that

C:\Rtools\bin;C:\Rtools\perl\bin;C:\Rtools\MinGW\bin;C:\Program
files\R\R-2.13.1\bin;C:\Program Files\HTML Help Workshop

was at the beginning.

With this, my attempts at package creation worked.

On Thu, Jun 30, 2011 at 12:51 PM, Juliet Hannah  wrote:
> I am trying to build a package using windows xp. Here is the error I am 
> getting:
>
> R CMD build myfunctions
>
> * checking for file 'myfunctions/DESCRIPTION' ... OK
> * preparing 'myfunctions':
> * checking DESCRIPTION meta-information ... OK
> * checking for LF line-endings in source and make files
> * checking for empty or unneeded directories
> * building 'myfunctions_1.0.tar.gz'
>  ERROR
> packaging into .tar.gz failed
>
> Could anyone suggest possible things to check? Thanks.
>
>> sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] formatting a 6 million row data set; creating a censoring variable

2011-08-31 Thread Juliet Hannah
List,

Consider the following data.

   gender mygroup id
1   F   A  1
2   F   B  2
3   F   B  2
4   F   B  2
5   F   C  2
6   F   C  2
7   F   C  2
8   F   D  2
9   F   D  2
10  F   D  2
11  F   D  2
12  F   D  2
13  F   D  2
14  M   A  3
15  M   A  3
16  M   A  3
17  M   B  3
18  M   B  3
19  M   B  3
20  M   A  4

Here is the reshaping I am seeking (explanation below).

 id mygroup mytime censor
[1,]  1   A  1  1
[2,]  2   B  3  0
[3,]  2   C  3  0
[4,]  2   D  6  1
[5,]  3   A  3  0
[6,]  3   B  3  1
[7,]  4   A  1  1

I need to create 2 variables. The first one is a time variable.
Observe that for id=2, the variable mygroup=B was observed 3 times. In
the solution we see in row 2 that id=2 has a mytime variable of 3.

Next, I need to create a censoring variable.

Notice id=2 goes through has values of B, C, D for mygroup. This means
the change from B to C and C to D is observed.  There is no change
from D. I need to indicate this with a 'censoring' variable. So B and
C would have values 0, and D would have a value of 1. As another
example, id=1 never changes, so I assign it  censor= 1. Overall, if a
change is observed, 0 should be assigned, and if a change is not
observed 1 should be assigned.

One potential challenge is that the original data set has over 5
million rows. I have ideas, but I'm still getting used the the
data.table and plyr syntax.  I also seek a base R solution. I'll post
the timings on the real data set shortly.

Thanks for your help.

> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

# Here is a simplified data set

myData <- structure(list(gender = c("F", "F", "F", "F", "F", "F", "F",
"F", "F", "F", "F", "F", "F", "M", "M", "M", "M", "M", "M", "M"
), mygroup = c("A", "B", "B", "B", "C", "C", "C", "D", "D", "D",
"D", "D", "D", "A", "A", "A", "B", "B", "B", "A"), id = c("1",
"2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "3",
"3", "3", "3", "3", "3", "4")), .Names = c("gender", "mygroup",
"id"), class = "data.frame", row.names = c(NA, -20L))


# here is plyr solution with  idata.frame

library(plyr)
imyData <-  idata.frame(myData)
timeData <- idata.frame(ddply(imyData, .(id,mygroup), summarize,
mytime = length(mygroup)))

makeCensor <- function(x) {
   myvec <- rep(0,length(x))
   lastInd <- length(myvec)
   myvec[lastInd] = 1
   myvec
}


plyrSolution <- ddply(timeData, "id", transform, censor = makeCensor(mygroup))


# here is a data table solution
# use makeCensor function from above

library(data.table)
mydt <- data.table(myData)
setkey(mydt,id,mygroup)

timeData <- mydt[,list(mytime=length(gender)),by=list(id,mygroup)]
makeCensor <- function(x) {
   myvec <- rep(0,length(x))
   lastInd <- length(myvec)
   myvec[lastInd] = 1
   myvec
}

mycensor <- timeData[,list(censor=makeCensor(mygroup)),by=id]
datatableSolution <- cbind(timeData,mycensor[,list(censor)])

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation and summaries with few million rows

2011-08-24 Thread Juliet Hannah
Thanks Dennis! I'll check this out.

Just to clarify, I need the total number of switches/changes
regardless of if that state
had occurred in the past. So A-A-B-A, would have 2 changes: A to B and B to A.

Thanks again.


On Wed, Aug 24, 2011 at 1:28 PM, Dennis Murphy  wrote:
> Hi Juliet:
>
> Here's a Q & D solution:
>
> # (1) plyr
>> f <- function(d) length(unique(d$mygroup)) - 1
>> ddply(myData, .(id), f)
>  id V1
> 1  1  0
> 2  2  2
> 3  3  1
> 4  4  0
>
> # (2) data.table
>
> myDT <- data.table(myData, key = 'id')
> myDT[, list(nswitch = length(unique(mygroup)) - 1), by = 'id']
>
> If one can switch back and forth between levels more than once, then
> the above is clearly not appropriate. A more robust method would be to
> employ rle() [run length encoding]:
>
> g <- function(d) length(rle(d$mygroup)$lengths) - 1
> ddply(myData, .(id), g)    # gives the same answer as above
> myDT[, list(nswitch = length(rle(mygroup)$lengths) - 1), by = 'id']   # ditto
>
>
> HTH,
> Dennis
>
> On Wed, Aug 24, 2011 at 9:48 AM, Juliet Hannah  
> wrote:
>> I have a data set with about 6 million rows and 50 columns. It is a
>> mixture of dates, factors, and numerics.
>>
>> What I am trying to accomplish can be seen with the following
>> simplified data, which is given as dput output below.
>>
>>> head(myData)
>>      mydate gender mygroup id
>> 1 2012-03-25      F       A  1
>> 2 2005-05-23      F       B  2
>> 3 2005-09-08      F       B  2
>> 4 2005-12-07      F       B  2
>> 5 2006-02-26      F       C  2
>> 6 2006-05-13      F       C  2
>>
>> For each id, I want to count the number of changes of the variable
>> 'mygroup' that occur. For example, id=1 has 0 changes because it is
>> observed only once.  id=2 has 2 changes (B to C, and C to D).  I also
>> need to calculate the total observation time for each id using the
>> variable mydate.  In the end, I am trying to have a new data set in
>> which each row has an id, days observed, number of changes, and
>> gender.
>>
>> I made some simple summaries using data.table and plyr, but I'm stuck
>> on this reformatting.
>>
>> Thanks for your help.
>>
>> myData <- structure(list(mydate = c("2012-03-25", "2005-05-23", "2005-09-08",
>> "2005-12-07", "2006-02-26", "2006-05-13", "2006-09-01", "2006-12-12",
>> "2006-02-19", "2006-05-03", "2006-04-23", "2007-12-08", "2011-03-19",
>> "2007-12-20", "2008-06-15", "2008-12-16", "2009-06-07", "2009-10-09",
>> "2010-01-28", "2007-06-05"), gender = c("F", "F", "F", "F", "F",
>> "F", "F", "F", "F", "F", "F", "F", "F", "M", "M", "M", "M", "M",
>> "M", "M"), mygroup = c("A", "B", "B", "B", "C", "C", "C", "D",
>> "D", "D", "D", "D", "D", "A", "A", "A", "B", "B", "B", "A"),
>>    id = c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>    3L, 3L, 3L, 3L, 3L, 3L, 4L)), .Names = c("mydate", "gender",
>> "mygroup", "id"), class = "data.frame", row.names = c(NA, -20L
>> ))
>>
>>> sessionInfo()
>> R version 2.13.1 (2011-07-08)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data manipulation and summaries with few million rows

2011-08-24 Thread Juliet Hannah
I have a data set with about 6 million rows and 50 columns. It is a
mixture of dates, factors, and numerics.

What I am trying to accomplish can be seen with the following
simplified data, which is given as dput output below.

> head(myData)
  mydate gender mygroup id
1 2012-03-25  F   A  1
2 2005-05-23  F   B  2
3 2005-09-08  F   B  2
4 2005-12-07  F   B  2
5 2006-02-26  F   C  2
6 2006-05-13  F   C  2

For each id, I want to count the number of changes of the variable
'mygroup' that occur. For example, id=1 has 0 changes because it is
observed only once.  id=2 has 2 changes (B to C, and C to D).  I also
need to calculate the total observation time for each id using the
variable mydate.  In the end, I am trying to have a new data set in
which each row has an id, days observed, number of changes, and
gender.

I made some simple summaries using data.table and plyr, but I'm stuck
on this reformatting.

Thanks for your help.

myData <- structure(list(mydate = c("2012-03-25", "2005-05-23", "2005-09-08",
"2005-12-07", "2006-02-26", "2006-05-13", "2006-09-01", "2006-12-12",
"2006-02-19", "2006-05-03", "2006-04-23", "2007-12-08", "2011-03-19",
"2007-12-20", "2008-06-15", "2008-12-16", "2009-06-07", "2009-10-09",
"2010-01-28", "2007-06-05"), gender = c("F", "F", "F", "F", "F",
"F", "F", "F", "F", "F", "F", "F", "F", "M", "M", "M", "M", "M",
"M", "M"), mygroup = c("A", "B", "B", "B", "C", "C", "C", "D",
"D", "D", "D", "D", "D", "A", "A", "A", "B", "B", "B", "A"),
id = c(1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 4L)), .Names = c("mydate", "gender",
"mygroup", "id"), class = "data.frame", row.names = c(NA, -20L
))

> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting names of dimnames of xtabs into xtable latex output

2011-08-18 Thread Juliet Hannah
Thanks to Duncan Mackay and Dennis Murphy for help.

The following solution seems to give me what I need.

library(memisc)
toLatex(ftable(cyl ~ am,data=mtcars))

For this to work, we have to use:

\documentclass{article}
\usepackage{booktabs}
\usepackage{dcolumn}
\begin{document}

at the beginning of the tex file. I learned this from some of Paul
Johnson's notes I found online ("Getting Nice Latex tables out of R").





On Tue, Aug 16, 2011 at 11:06 PM, Duncan Mackay  wrote:
> Hi
>
> Will this fix the problem?
>
> str(table2)
> xtable(data.frame(table2))
> % latex table generated in R 2.13.1 by xtable 1.5-6 package
> % Wed Aug 17 13:02:38 2011
> \begin{table}[ht]
> \begin{center}
> \begin{tabular}{rllr}
>  \hline
>  & change\_diet & mydiet & Freq \\
>  \hline
> 1 & Don't know & 0 & 26.00 \\
>  2 & Somewhat likely & 0 & 0.00 \\
>  3 & Somewhat unlikely & 0 & 40.00 \\
>  4 & Very likely & 0 & 0.00 \\
>  5 & Very unlikely & 0 & 10.00 \\
>  6 & Don't know & 1 & 0.00 \\
>  7 & Somewhat likely & 1 & 188.00 \\
>  8 & Somewhat unlikely & 1 & 0.00 \\
>  9 & Very likely & 1 & 281.00 \\
>  10 & Very unlikely & 1 & 0.00 \\
>   \hline
> \end{tabular}
> \end{center}
> \end{table}
>
> Regards
>
> Duncan
>
>
> Duncan Mackay
> Department of Agronomy and Soil Science
> University of New England
> ARMIDALE NSW 2351
> Email: home mac...@northnet.com.au
>
>
>
>
>
> At 02:03 17/08/2011, you wrote:
>>
>> In R, the output of xtabs displays the names of the dimnames.  In the
>> example below, these are "change_diet" and "mydiet". Is there a way to
>> have xtable incorporate these names directly into the latex output.
>> Thanks for your help.
>>
>> table2 <- structure(c(26, 0, 40, 0, 10, 0, 188, 0, 281, 0), .Dim = c(5L,
>> 2L), .Dimnames = structure(list(change_diet = c("Don't know",
>> "Somewhat likely", "Somewhat unlikely", "Very likely", "Very unlikely"
>> ), mydiet = c("0", "1")), .Names = c("change_diet", "mydiet"
>> )), class = c("xtabs", "table"))
>> table2
>>
>>
>> library(xtable)
>> xtable(table2)
>>
>>
>>
>> > sessionInfo()
>> R version 2.13.1 (2011-07-08)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>> States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] xtable_1.5-6
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting names of dimnames of xtabs into xtable latex output

2011-08-17 Thread Juliet Hannah
Thanks for the suggestion, Duncan.

However, I was trying to maintain the contingency
table/cross-classification structure of the original table.

My use of xtable on this table, maintains the structure I want, but
the labels for the rownames and colum names is lost.



On Tue, Aug 16, 2011 at 11:06 PM, Duncan Mackay  wrote:
> Hi
>
> Will this fix the problem?
>
> str(table2)
> xtable(data.frame(table2))
> % latex table generated in R 2.13.1 by xtable 1.5-6 package
> % Wed Aug 17 13:02:38 2011
> \begin{table}[ht]
> \begin{center}
> \begin{tabular}{rllr}
>  \hline
>  & change\_diet & mydiet & Freq \\
>  \hline
> 1 & Don't know & 0 & 26.00 \\
>  2 & Somewhat likely & 0 & 0.00 \\
>  3 & Somewhat unlikely & 0 & 40.00 \\
>  4 & Very likely & 0 & 0.00 \\
>  5 & Very unlikely & 0 & 10.00 \\
>  6 & Don't know & 1 & 0.00 \\
>  7 & Somewhat likely & 1 & 188.00 \\
>  8 & Somewhat unlikely & 1 & 0.00 \\
>  9 & Very likely & 1 & 281.00 \\
>  10 & Very unlikely & 1 & 0.00 \\
>   \hline
> \end{tabular}
> \end{center}
> \end{table}
>
> Regards
>
> Duncan
>
>
> Duncan Mackay
> Department of Agronomy and Soil Science
> University of New England
> ARMIDALE NSW 2351
> Email: home mac...@northnet.com.au
>
>
>
>
>
> At 02:03 17/08/2011, you wrote:
>>
>> In R, the output of xtabs displays the names of the dimnames.  In the
>> example below, these are "change_diet" and "mydiet". Is there a way to
>> have xtable incorporate these names directly into the latex output.
>> Thanks for your help.
>>
>> table2 <- structure(c(26, 0, 40, 0, 10, 0, 188, 0, 281, 0), .Dim = c(5L,
>> 2L), .Dimnames = structure(list(change_diet = c("Don't know",
>> "Somewhat likely", "Somewhat unlikely", "Very likely", "Very unlikely"
>> ), mydiet = c("0", "1")), .Names = c("change_diet", "mydiet"
>> )), class = c("xtabs", "table"))
>> table2
>>
>>
>> library(xtable)
>> xtable(table2)
>>
>>
>>
>> > sessionInfo()
>> R version 2.13.1 (2011-07-08)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>> States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] xtable_1.5-6
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] getting names of dimnames of xtabs into xtable latex output

2011-08-16 Thread Juliet Hannah
In R, the output of xtabs displays the names of the dimnames.  In the
example below, these are "change_diet" and "mydiet". Is there a way to
have xtable incorporate these names directly into the latex output.
Thanks for your help.

table2 <- structure(c(26, 0, 40, 0, 10, 0, 188, 0, 281, 0), .Dim = c(5L,
2L), .Dimnames = structure(list(change_diet = c("Don't know",
"Somewhat likely", "Somewhat unlikely", "Very likely", "Very unlikely"
), mydiet = c("0", "1")), .Names = c("change_diet", "mydiet"
)), class = c("xtabs", "table"))
table2


library(xtable)
xtable(table2)



> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] xtable_1.5-6

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] improve formatting of HTML table

2011-08-11 Thread Juliet Hannah
I am trying to improve the look of an HTML table for a report (that
needs to be pasted into Word).

Here is an example.

table2 <- structure(c(26L, 0L, 40L, 0L, 10L, 0L, 0L, 188L, 0L, 281L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 4L), .Dim = c(6L, 3L), .Dimnames = structure(list(
myvar = c("Don't know", "Somewhat likely", "Somewhat unlikely",
"Very likely", "Very unlikely", NA), var_recode = c("0", "1",
NA)), .Names = c("myvar", "var_recode")), class = "table")


library("R2HTML")
.HTML.file = paste(getwd(), "/example.html", sep = "")
HTML(table2)


In the output, I would like to improve the justification of the
numbers (or any other suggestion to make
the HTML look nicer). The columns are a little hard to read.

Thanks.

> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] R2HTML_2.2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] suggestions regarding reading in a messy file

2011-07-13 Thread Juliet Hannah
Thanks David. count.fields revealed the problem, and pointed me in a
direction to understand some basics
that I had missed.

Writing the original file with quote=TRUE solved the problem or
reading it in with
quote="" also fixed the problem.


On Tue, Jul 12, 2011 at 4:48 PM, David Winsemius  wrote:
>
> On Jul 12, 2011, at 4:37 PM, Juliet Hannah wrote:
>
>> I have a file in stata format, which I have read in, and I am trying
>> to create a text file. I have exported the data using various
>> delimiters, but I'm unable to read it back in. I originally read in
>> the file with:
>>
>> library(foreign)
>> myData <- read.dta("mydata.dta")
>>
>> I then exported it with write.table using comma, tab, and exclamation
>> marks as a delimiter.
>>
>> When I was unable to read in it, I used readLines to check the number
>> of fields in each row. For example, when using a comma, I checked the
>> number of entries in each line using:
>>
>> con <- file("
>> while ( length(oneLine <- readLines(con, 1)) ) {
>>  lineLength <- length(strsplit(oneLine,",")[[1]])
>>  cat(lineLength,"\n")
>>  }
>> close(con)
>>
>> This prints out 57 for each line.
>
> But does not test for unmatched quotes, extraneous "#",  and such.
>
> Try instead:
>
> count.fields(myfile.txt", sep=",")
>
>>
>> But then I try:
>>
>> cc <- rep("character",57)
>> myData <- read.table("myfile.txt",header=TRUE,sep=",",colClasses=cc)
>>
>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>>  :
>>  line 10 did not have 57 elements
>>
>> I'm unable to post a sample of the data so I'm just looking for
>> suggestions. The data  is messy meaning some of the fields have
>> comments as the survey response. Still, I was able to work with it as
>> long as I read it in from the stata  file.
>>
>> I was trying to avoid using the 'fill' option because that has given
>> me problems before.
>>
>> Thanks for your help.
>>
>> Juliet
>>
>>> sessionInfo()
>>
>> R version 2.13.0 (2011-04-13)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>> States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>                     LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] foreign_0.8-43
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] suggestions regarding reading in a messy file

2011-07-12 Thread Juliet Hannah
I have a file in stata format, which I have read in, and I am trying
to create a text file. I have exported the data using various
delimiters, but I'm unable to read it back in. I originally read in
the file with:

library(foreign)
myData <- read.dta("mydata.dta")

I then exported it with write.table using comma, tab, and exclamation
marks as a delimiter.

When I was unable to read in it, I used readLines to check the number
of fields in each row. For example, when using a comma, I checked the
number of entries in each line using:

con <- file("myfile.txt", "r")
while ( length(oneLine <- readLines(con, 1)) ) {
   lineLength <- length(strsplit(oneLine,",")[[1]])
  cat(lineLength,"\n")
   }
close(con)

This prints out 57 for each line.

But then I try:

 cc <- rep("character",57)
 myData <- read.table("myfile.txt",header=TRUE,sep=",",colClasses=cc)

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  line 10 did not have 57 elements

 I'm unable to post a sample of the data so I'm just looking for
suggestions. The data  is messy meaning some of the fields have
comments as the survey response. Still, I was able to work with it as
long as I read it in from the stata  file.

I was trying to avoid using the 'fill' option because that has given
me problems before.

Thanks for your help.

Juliet

> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
  LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] foreign_0.8-43

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] deming regresion to make 2 variables comparable

2011-07-11 Thread Juliet Hannah
See if the following thread

http://www.mail-archive.com/r-help@r-project.org/msg85070.html

and the paper cited in it are helpful. Terry Therneau provides code
for a Deming regression.



On Thu, Jul 7, 2011 at 12:58 PM, devon woodcomb  wrote:
> Hi,
>
> I have a dataset which has var1 from 1 sourse and var2 from 2 different
> methods. I need a new variable such that var2 values from both methods can
> beused as 1 variable.
> I believe deming regression can be used to do this. I just don't know how to
> do it.
>
> My data looks like:
> idvar1var2method1var2method2
> 11.22.1NA
> 21.62.4NA
> 31.52.2NA
> 41.3NA2.8
> 51.6NA3.1
> 61.4NA2.7
>
>
> Please help.
>
> Sincerely,
> Devon
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error building package: packaging into .tar.gz failed

2011-06-30 Thread Juliet Hannah
I am trying to build a package using windows xp. Here is the error I am getting:

R CMD build myfunctions

* checking for file 'myfunctions/DESCRIPTION' ... OK
* preparing 'myfunctions':
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* building 'myfunctions_1.0.tar.gz'
 ERROR
packaging into .tar.gz failed

Could anyone suggest possible things to check? Thanks.

> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] indexing with which, logical indexing, and missing values

2011-06-28 Thread Juliet Hannah
I have a data frame in which missing values exist, and I need to
recode the string "missing" to a missing value. For the example, let's
assume I cannot do this while reading it in. Even though this has been
discussed extensively, I'm still a little confused about when to index
with "which" and when to use logical indexing.

Below is what I have done. Is there a more R-appropriate way to do this? Thanks.


# create data with missing values

myData <- head(mtcars)
myData[c(1,2),'cyl'] <- "missing"
is.na(myData[3,'cyl']) <- TRUE
myData[c(1,2,5),'disp'] <- "missing"
myData


# loop through columns to replace "missing"

myColNames <- colnames(myData)

for (myCol in myColNames) {
   NA_index <- which(myData[,myCol] =="missing")

# which creates problems when no columns have "missing"

   if (length(NA_index) > 0) {
  is.na(myData[NA_index,myCol]) <- TRUE
   }
}

myData

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on selecting genes showing highest variance

2011-06-10 Thread Juliet Hannah
# Let's say your expression data is in a matrix
# named expression in which the rows are genes
# and the columns are samples

myvars <- apply(expression,1, var,na.rm=TRUE)
myvars <- sort(myvars,decreasing=TRUE)
myvars <- myvars[1:200]
expression <- expression[names(myvars),]
dim(expression)


Also check out the genefilter package in bioconductor. You may find
the bioconductor
mailing list is better for questions like this one.


On Tue, Jun 7, 2011 at 9:47 AM, GIS Visitor 33  wrote:
> Hi
>
> I have a problem for which I would like to know a solution. I have a gene 
> expression data and I would like to choose only lets say top 200 genes that 
> had the highest expression variance across patients.
>
> How do i do this in R?
>
> I tried x=apply(leukemiadata,1,var)
> x1=x[order(-1*x)]
>
> but the problem here is  x and x1 are numeric data , If I choose the first 
> 200 after sorting in descending, so I do not know how to choose the 
> associated samples with just the numeric values.
>
> Kindly help!
>
>
> Regards
> Ap
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a data frame by dropping correlated variables

2011-04-27 Thread Juliet Hannah
The 'findCorrelation' function in the caret package may be helpful.


On Tue, Apr 19, 2011 at 3:10 PM, Rita Carreira  wrote:
>
> Hello R Users!
> I have a data frame that has many variables, some with missing observations, 
> and some that are correlated with each other. I would like to subset the data 
> by dropping one of the variables that is correlated with another variable 
> that I will keep int he data frame. Alternatively, I could also drop both the 
> variables that are correlated with each other. Worry not! I am not deleting 
> data, I am just finding a subset of the data that I can use to impute some 
> missing observations.
> I have tried the following statement
> dfQuc <- dfQ[ , sapply(dfQ, function(x) cor(dfQ, use = 
> "pairwise.complete.obs", method ="pearson")<0.8)]
> but it gives me the following error:
> Error in `[.data.frame`(dfQ, , sapply(dfQ, function(x) cor(dfQ, use = 
> "pairwise.complete.obs",  :
>  undefined columns selected
> Since I have several dozen data frames, it is impractical for me to manually 
> inspect the correlation matrices and select which variables to drop, so I am 
> trying to have R make the selection for me. Does any one have any idea on how 
> to accomplish this?
> Thank you very much!
> Rita = "If you think education is 
> expensive, try ignorance."--Derek Bok
>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GLM output for deviance and loglikelihood

2011-04-20 Thread Juliet Hannah
As you mentioned, the deviance does not always reduce to:

D = -2(loglikelihood(model))

It does for ungrouped data, such as for binary logistic regression. So
let's stick with the original definition.
In this case, we need the log-likelihood for the saturated model.

x = rnorm(10)

 y = rpois(10,lam=exp(1 + 2*x))

 test = glm(formula = y ~ x, family = poisson)


sm <- glm(y ~ factor(1:10),family=poisson)

mydev <- as.numeric(2*(logLik(sm)-logLik(test)))
mydev
deviance(test)


On Fri, Apr 15, 2011 at 7:00 AM, Jeffrey Pollock
 wrote:
> It has always been my understanding that deviance for GLMs is defined
> by;
>
>
>
> D =  -2(loglikelihood(model) - loglikelihood(saturated model))
>
>
>
> and this can be calculated by (or at least usually is);
>
>
>
> D = -2(loglikelihood(model))
>
>
>
> As is done so in the code for 'polr' by Brian Ripley (in the package
> 'MASS') where the -loglikehood is minimised using optim;
>
>
>
> res <- optim(s0, fmin, gmin, method = "BFGS", hessian = Hess, ...)
>
> .
>
> .
>
> .
>
> deviance <- 2 * res$value
>
>
>
> If so, why is it that;
>
>
>
>> x = rnorm(10)
>
>> y = rpois(10,lam=exp(1 + 2*x))
>
>> test = glm(formula = y ~ x, family = poisson)
>
>> deviance(test)
>
> [1] 5.483484
>
>> -2*logLik(test)
>
> [1] 36.86335
>
>
>
> I'm clearly not understanding something here, can anyone shed any light?
> Why is;
>
>
>
> -2*logLik(test) =/= deviance(test) ???
>
>
>
> I think this is something that is poorly understood all over the
> internet (at least from my google searches anyway!)
>
>
>
> Thanks,
>
>
>
> Jeff
>
>
>
> - 
> ** Confidentiality: The contents 
> of this e-mail and any attachments transmitted with it are intended to be 
> confidential to the intended recipient; and may be privileged or otherwise 
> protected from disclosure. If you are not an intended recipient of this 
> e-mail, do not duplicate or redistribute it by any means. Please delete it 
> and any attachments and notify the sender that you have received it in error. 
> This e-mail is sent by a William Hill PLC group company. The William Hill 
> group companies include, among others, William Hill PLC (registered number 
> 4212563), William Hill Organization Limited (registered number 278208), 
> William Hill Credit Limited (registered number 413846), WHG (International) 
> Limited (registered number 99191) and WHG Trading Limited (registered number 
> 101439). Each of William Hill PLC, William Hill Organization Limited and 
> William Hill Credit Limited is registered in Engl!
>  and and Wales and has its registered office at Greenside House, 50 Station 
> Road, Wood Green, London N22 7TP. Each of WHG (International) Limited and WHG 
> Trading Limited is registered in Gibraltar and has its registered office at 
> 6/1 Waterport Place, Gibraltar. Unless specifically indicated otherwise, the 
> contents of this e-mail are subject to contract; and are not an official 
> statement, and do not necessarily represent the views, of William Hill PLC, 
> its subsidiaries or affiliated companies. Please note that neither William 
> Hill PLC, nor its subsidiaries and affiliated companies can accept any 
> responsibility for any viruses contained within this e-mail and it is your 
> responsibility to scan any emails and their attachments. William Hill PLC, 
> its subsidiaries and affiliated companies may monitor e-mail traffic data and 
> also the content of e-mails for effective operation of the e-mail system, or 
> for security, purposes. *
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] no solution yet, please help: extract p-value from mixed model in kinship package

2011-04-18 Thread Juliet Hannah
Maybe the pedigree is not set up correctly. If this is the case, the
kinship matrix will not be constructed correctly. I see that in this
example,
the diagonal terms differ.

diag(kmat)

lmekin runs fine for me, and I can extract p-values with:

   lmekinfit <- lmekin(...)
   pval <- lmekinfit$ctable;



On Fri, Apr 15, 2011 at 9:30 AM, Ram H. Sharma  wrote:
>  I am making the question clear. Please help.
>
>
>
>> Dear R experts
>>
>> I was using kinship package to fit mixed model with kinship matrix.
>> The package looks like lme4, but I could find a way to extract p-value
>> out of it. I need to extract is as I need to analyse large number of
>> variables (> 1).
>>
>> Please help me:
>>
>> require(kinship)
>>
>> #Generating random example  data
>>
>
>
>> #pedigree data*
>
>
> id <- 1:100
>
> dadid <- c(rep(0, 5), rep(1, 5), rep(3, 5), rep(5, 5), rep(7, 10),
> rep(9, 10), rep(11, 10), rep(13, 10), rep(15, 10), rep(17, 10),
> rep(19, 10), rep(21, 10))
>
> momid <- c(rep(0, 5), rep(2, 5), rep(4, 5), rep(6, 5), rep(8, 10),
> rep(10, 10), rep(12, 10), rep(14, 10), rep(16, 10), rep(18, 10),
> rep(20, 10), rep(22, 10) )
>
> ped <- data.frame(id, dadid, momid)
>
> # *kmatrix**
>
>> cfam <- makefamid(ped$id,ped$momid, ped$dadid)
>>
>> kmat <- makekinship(cfam, ped$id, ped$momid, ped$dadid)
>>
>> #*x and y variables
> *
>
>> set.seed(3456)
>>
>> dat <- sample(c(-1,0,1), 1, replace = TRUE)
>>
>> snpmat<- data.frame(matrix(dat, ncol = 100))
>>
>> names(snpmat) <- c(paste ("VR",1:100, sep='' ))
>>
>> yvar <- rnorm(100, 30, 5)
>> covtrait <-  rnorm(100, 10, 5)
>>
>> mydata <- data.frame(id, yvar, covtrait, snpmat)
>>
> #**mixed model in lmekin
> ***
>
>>
>> fmod <- lmekin(yvar ~ mydata[,3] , data= mydata, random = ~1|id,
>> varlist=list(kmat)) $coefficients[2,4] # does not work
>>
>> # **error
>> message
>
>
>
>> Error message:
>
> Error in lmekin(yvar ~ mydata[, 3], data = mydata, random = ~1 | id, varlist
> = list(kmat))$coefficients[2,  :
>  incorrect number of dimensions
> In addition: Warning message:
> In coxme.varcheck(ncluster, varlist, n, gvars, groups, sparse, rescale,  :
>  Diagonal of variance matrix is not constant
>
> #**ultimate target: to put in
> loop***
>
>> Ultimately I want to put into the loop:
>>
>> for(i in 3:length(mydata)) {
>>
>> P <- vector (mode="numeric", length = 1000)
>>
>> P[i] <- lmekin(yvar~ mydata[,i] , data= mydata, random = ~1|id,
>> varlist=list(kmat)) $coefficients[2,4]
>>
>> }
>>
>
>
>> Same errors: I tried lme4 conventioned but did not work !
>
>
>
>> I can extract fixed effects as well as I do in lme4
>>  b <- fixef(fit1)
>>
>
>
>> Error in UseMethod("fixef") :
>>   no applicable method for 'fixef' applied to an object of class "lmekin"
>>
>>
>> --
>>
>> Ram H
>>
>
>
>
> --
>
> Ram H
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] converting affybatch object to matrix

2011-04-04 Thread Juliet Hannah
Use exprs on the output from RMA (or another method you like)

library("affy")
myData <-ReadAffy()
myRMA <- rma(myData)
e = exprs(myRMA)

Also, check out the  Bioconductor mailing list where
Bioconductor-related topics are discussed.


On Fri, Apr 1, 2011 at 9:54 AM, Landes, Ezekiel
 wrote:
> I have an Affybatch object called "batch" :
>
>>
>> batch
> AffyBatch object
> size of arrays=1050x1050 features (196 kb)
> cdf=HuGene-1_0-st-v1 (32321 affyids)
> number of samples=384
> number of genes=32321
> annotation=hugene10stv1
> notes=
>>
>>
>
> Is there a way of converting a portion of this data into a matrix? More 
> specifically, a matrix where the 384 samples are columns and the 32321 genes 
> are rows? The "exprs" function returns a matrix that has 384 columns but for 
> some reason there are 1050^2 rows.
>
> Thanks!
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About proportional odds ratio model with LASSO in ordinal regression

2011-03-27 Thread Juliet Hannah
If you can work with a different penalty check out the lrm function
from the rms package, which uses
penalized likelihood to fit proportional odds.

2011/3/24 Jheng-Jhong Wang :
> Dear R-users,
>
>         I try to fit proportional odds ratio model "with LASSO" in
> ordinal regression.
>  But I just find as below:
> "glmnet" package which used in Binomail and Multinomail response.
> "glmnetcr"  package which used in contiuation-Ratio Logits model.
> Does someone know a package and/or function on R to do proportional
> odds ratio model with LASSO together?
> Thanks.
>
> ---
> Jheng Jhong Wang
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] covar

2011-02-20 Thread Juliet Hannah
Relatedness if often defined in terms of the kinship matrix. It may be
helpful to search for this. Several packages in R use this matrix
including the kinship package.

On Wed, Feb 16, 2011 at 3:14 PM, Val  wrote:
> Hi all,
>
> I want to construct relatedness among individuals and have a look at the
> following script.
>
> #
> rm(list=ls())
>
> N=5
> id   = c(1:N)
> dad = c(0,0,0,3,3)
> mom  = c(0,0,2,1,1)
> sex  = c(2,2,1,2,2) # 1= M and 2=F
>
>   A=diag(nrow = N)
>   for(i in 1:N)    {
>      for(j in i:N)      {
>         ss = dad[j]
>         dd = mom[j]
>         sx = sex[j]
>          if( ss > 0 && dd > 0 )
>            {
>              if(i == j)
>                   { A[i,i] = 1 + 0.5*A[ss,dd] }
>                 else
>                  { A[i,j] = A[i,ss] + 0.5*(A[i,dd])
>                    A[j,i] = A[i,j] }
>            }
>
>      } #inner for loop
>     } # outer for loop
>  A
>
> If the sex is male ( sex=1)  then I want to set A[i,i]=0.5*A[ss,dd]
> If it is female ( sex=2) then A[i,i] = 1 + 0.5*A[ss,dd]
>
>
> How do I do it ?
>
> I tried several cases but it did not work from me. Your assistance is
> highly  appreciated  in advance
>
> Thanks
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] series of boxplots

2011-02-11 Thread Juliet Hannah
If you could provide a small example of an actual data set (using
dput), you may get some suggestions
specific to your goals.

Here are a few examples of boxplots. If these look along the lines of
what you are looking for, you may want to search
the ggplot2 mailing list for more examples.

library(ggplot2)
qplot(factor(cyl), mpg, data=mtcars, geom="boxplot")

# example 2

mydata <- data.frame( group1=
sample(c("C","D"),size=100,replace=TRUE),
group2=sample(c("E","F"),size=100,replace=TRUE),  y=rnorm(100))

qplot(group2, y, data=mydata, facets = ~ group1,  geom="boxplot")

On Mon, Feb 7, 2011 at 6:16 AM, syrvn  wrote:
>
> hi group,
>
> imagine the following data frame df:
>
> 1 2 3 4 ...
> A 5 1 ..
> A 4 3 ..
> A 3 4 ..
> B 7 9 ..
> B 8 1 ..
> B 6 8 ..
>
> I tried the following and some variations to plot this matrix as boxplots:
>
>
> boxplot(df[1:3,2]~df[1:3,1], xlim=c(1,10))
> par(new=TRUE)
> boxplot(cpd12[4:6,2]~df[1:3,1], xlim=c(2,10))
> par(new=TRUE)
> boxplot(df[1:3,3]~df[1:3,1], xlim=c(1,10))
> par(new=TRUE)
> boxplot(cpd12[4:6,3]~df[1:3,1], xlim=c(2,10))
>
>
> can anybody help?
> Cheers
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/series-of-boxplots-tp3263938p3263938.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GWAF package: lme.batch.imputed(): object 'kmat' not found

2011-02-07 Thread Juliet Hannah
GWAF uses the kinship package. The documentation is pretty good for
it, and I've used it successfully. It may be helpful to get that
working before trying automate some tasks using GWAF.

On Fri, Feb 4, 2011 at 2:20 PM, Jim Moon  wrote:
> Hello, All,
>
> GWAF 1.2
> R.Version() is below.
>
> system(lme.batch.imputed(
> phenfile = 'phenfile.csv',
> genfile = 'CARe_imputed_release.0.fhsR.gz',
> pedfile='pedfile.csv',
> phen='phen1',
> covar=c('covar1','covar2'),
> kinmat='imputed_fhs.kinship.RData',
> outfile='imputed.FHS.IBC.GWAF.LME.output.0.txt'
> ))
>
> Gives the error messages:
>
> Error in coxme.varcheck(ncluster, varlist, n, gvars, groups, sparse, rescale, 
>  :
>  object 'kmat' not found
> Error in lme.cov.out$theta : $ operator is invalid for atomic vectors
>
> Might anyone know why?
>
> Thank you for your time!
>
> Jim
>
> ---
>> R.Version()
> $platform
> [1] "x86_64-apple-darwin9.6.0"
>
> $arch
> [1] "x86_64"
>
> $os
> [1] "darwin9.6.0"
>
> $system
> [1] "x86_64, darwin9.6.0"
>
> $status
> [1] ""
>
> $major
> [1] "2"
>
> $minor
> [1] "12.1"
>
> $year
> [1] "2010"
>
> $month
> [1] "12"
>
> $day
> [1] "16"
>
> $`svn rev`
> [1] "53855"
>
> $language
> [1] "R"
>
> $version.string
> [1] "R version 2.12.1 (2010-12-16)"
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with factor analysis

2011-01-27 Thread Juliet Hannah
It looks like the text didn't show assigning the results of factanal
to an object. Try:

pgdata<-read.table("pgfull.txt",header=T)
names(pgdata)
pgd<-pgdata[,1:54]
#missing line
model <- factanal(pgd,8)

par(mfrow=c(2,2))
plot(loadings(model)[,1],loadings(model)[,2],pch=16,xlab="Factor
1",ylab="Factor 2")
plot(loadings(model)[,1],loadings(model)[,3],pch=16,xlab="Factor
1",ylab="Factor 3")
plot(loadings(model)[,1],loadings(model)[,4],pch=16,xlab="Factor
1",ylab="Factor 4")
plot(loadings(model)[,1],loadings(model)[,5],pch=16,xlab="Factor
1",ylab="Factor 5")


On Mon, Jan 24, 2011 at 7:03 AM, Simon Hayward
 wrote:
> Hi all,
>
>
>
> I am using the example on page 737 of "The R Book" by Michael J Crawley,
> to plot factor loadings against each other (in a multivariate analysis).
>
>
>
> However the following line code
>
>
>
>
>
> plot(loadings(model)[,1],loadings(model)[,2],pch=16,xlab="Factor 1",
> ylab="Factor 2")
>
>
>
> throws an error message
>
>
>
> "Error in plot.window(...) : need finite 'xlim' values
>
> In addition: Warning messages:
>
> 1: In min(x) : no non-missing arguments to min; returning Inf
>
> 2: In max(x) : no non-missing arguments to max; returning -Inf
>
> 3: In min(x) : no non-missing arguments to min; returning Inf
>
> 4: In max(x) : no non-missing arguments to max; returning -Inf"
>
>
>
> I have tried putting in values for the x and y limits, but even then no
> points appear in my plots.
>
>
>
> Does anyone know what I am doing wrong?
>
>
>
> Grateful for any help.
>
>
>
> Simon Hayward
>
>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Heat map in R

2011-01-09 Thread Juliet Hannah
Make sure your data is a matrix. There are many examples of expression heatmaps
available on the bioconductor list. After checking out these examples, I would
post to the bioconductor list if you are still having problems. Also
consider a small
example to get you a working heatpmap. You have to install two
bioconductor packages for this by using:

source("http://bioconductor.org/biocLite.R";)
biocLite(c("ALL","genefilter"))

This will also install other bioconductor packages  that are needed.

#Then try:

library(ALL)
library(genefilter)
data(ALL)

#  just creating create data

 bcell <- grep("^B", as.character(ALL$BT))
 types <- c("NEG", "BCR/ABL")
 mysub <- which(as.character(ALL$mol.biol) %in% types)
 bc  <- ALL[, intersect(bcell, mysub)]
 bc$BT <- factor(bc$BT)
 bc$mol.biol <- factor(bc$mol.biol)
filter_bc  <- nsFilter(bc,var.cutoff=0.9)
 myfilt  <- filter_bc$eset
e <- exprs(myfilt)
# end of data creation

dim(e)
#[1] 880  79

class(e)
# [1] "matrix"

heatmap(e)





On Wed, Jan 5, 2011 at 4:33 PM, lraeb...@sfu.ca  wrote:
>
> Hello,
> I am trying to make a heatmap in R and am having some trouble. I am very new
> to the world of R, but have been told that what I am trying to do should be
> possible.  I want to make a heat map that looks like  a gene expression
> heatmap (see http://en.wikipedia.org/wiki/Heat_map).
>
> I have 43 samples and 900 genes (yes I know this will be a huge map). I also
> have copy numbers associated with each gene/sample and need these to be
> represented as the colour intensities on the heat map.  There are multiple
> genes per sample with different copy numbers. I think my trouble may be how
> I am setting up my data frame.
>
> My data frame was created in excel as a tab deliminated text file:
>
> Gene   Copy Number   Sample ID
> A       1935              01
> B       2057              01
> C       2184              02
> D       1498              03
> E       2294              03
> F       2485              03
> G       1560              04
> H       3759              04
> I       2792              05
> J       7081              05
> K       1922              06
> .        .                .
> .        .                .
> .        .                .
> ZZZ     1354              43
>
>
> My code in R is something like this:
>
> data<-read.table("/Users/jsmt/desktop/test.txt",header=T)
>
> data_matrix<-data.matrix(data)
>
> data_heatmap <- heatmap(data_matrix, Rowv=NA, Colv=NA, col = cm.colors(256),
> scale="column", margins=c(5,10))
>
> I end up getting a heat map split into 3 columns: sample, depth, gene and
> the colours are just in big blocks that don't mean anything.
>
> Can anyone help me with my dataframe or my R code?  Again, I am fairly new
> to R, so if you can help, please give me very detailed help :)
>
> Thanks in advance!
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Heat-map-in-R-tp3176478p3176478.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to add frequencies to barplot

2010-11-23 Thread Juliet Hannah
Also check out the following post:

http://permalink.gmane.org/gmane.comp.lang.r.general/210897

On Sat, Nov 20, 2010 at 4:32 PM, casperyc  wrote:
>
> Hi,
>
> I have count data
>
> x2=rep(c(0:3),c(13,80,60,27))
>
> x2
>  0  1  2  3
> 13 80 60 27
>
> I want to graph to be ploted as
>
> barplot(table(x2),density=4)
>
> how do I add relative frequency to it, like in
>
> hist(x2,labels=T)
>
> above the 'bar's
>
> Thanks.
>
> casper
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/how-to-add-frequencies-to-barplot-tp3051923p3051923.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Number above the bar?

2010-11-14 Thread Juliet Hannah
Check out ggplot2, specifically geom_bar and geom_text.

http://had.co.nz/ggplot2/

You have to get used to its syntax, which can take some time, but
after that it can make things a lot easier. Here is an example.

library(ggplot2)

df <- data.frame(xvar = factor(c(1, 2)),
yvar = c(1, 5))

p <- ggplot(df, aes( y=yvar, x=xvar))
p <- p + geom_bar( stat="identity")
p <- p + geom_text(aes(y=yvar+1,label=yvar))
p


On Thu, Nov 11, 2010 at 4:03 AM, Joel  wrote:
>
> Hi
>
> I got an barplot, and I would like to have the exact number of the bars just
> above the bars anyone know how to do this?
>
> Sorry for bad English, and I do hope that you understand what im after.
>
> //Joel
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Number-above-the-bar-tp3037438p3037438.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ordered logit with polr won't match SPSS output

2010-09-30 Thread Juliet Hannah
I think the most common reason to see different parameter estimates
with ordinal regression is that programs set up
the model differently.

For example, check out

library(MASS)
?polr

We see polr uses:

logit P(Y <= k | x) = zeta_k - eta

and notes that other software packages may use the opposite sign for
eta. Also check out that the ordered variables have
the same order (reference category). I sometimes have messed that up.



On Mon, Sep 27, 2010 at 2:53 PM, Ben Hunter  wrote:
> I am learning R via a textbook that performs analysis with SPSS and SAS. In
> trying to reproduce the results for an ordinal logit model, I get very
> similar point estimates for my cut-off points, but the parameters for the
> covariate q60 do not match. The estimate for q51 also matches. Is this
> because I need to change a base case for the ordered covariate q60? Can this
> be done in or is it always the first case?
>
> mod<-polr(as.ordered(q43j)~as.ordered(q60)+q51, method="logistic")
>
> Perhaps a book using R would be a better idea, but it's the content and
> price (free) of this book that I like.
>
> Thanks a lot,
>
> Ben
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] post

2010-09-18 Thread Juliet Hannah
See if  rowttests is any faster.

library(genefilter)
?rowttests

You have to install Bioconductor. I've used this on large datasets,
but I haven't compared
timings.

On Mon, Sep 13, 2010 at 4:26 PM, Alexey Ush  wrote:
> Hello,
>
> I have a question regarding how to speed up the t.test on large dataset. For 
> example, I have a table "tab" which looks like:
>
>        a       b       c       d       e       f       g       h
> 1
> 2
> 3
> 4
> 5
>
> ...
>
> 10
>
> dim(tab) is 10 x 100
>
>
>
> I need to do the t.test for each row on the two subsets of columns, ie to 
> compare a b d group against e f g group at each row.
>
>
> subset 1:
>        a       b       d
> 1
> 2
> 3
> 4
> 5
>
> ...
>
> 10
>
>
> subset 2:
>        e       f       g
> 1
> 2
> 3
> 4
> 5
>
> ...
>
> 10
>
>    10 t.test's for each row for these two subsets will take around 1 min. 
> The prblem is that I have around 1 different combinations of such a 
> subsets. therefore 1min*1
> =1min in the case if I will use "for" loop like this:
>
> n1=1 #number of subset combinations
> for (i1 in 1:n1) {
>
> n2=10 # number of rows
> i2=1
> for (i2 in 1:n1) {
>        t.test(tab[i2,v5],tab[i2,v6])$p.value  #v5 and v6 are vectors 
> containing the veriable names for the two subsets (they are different for 
> each loop)
>        }
>
> }
>
>
> My question is there more efficient way how to do this computations in a 
> short period of time? Any packages, like plyr? May be direct calculations 
> isted of using t.test function?
>
>
> Thank you.
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Founding

2010-09-16 Thread Juliet Hannah
Hi Group,

I have a possibly naive question, but it seems like it fits into this
discussion.

I have observed that when researchers publish findings that are deemed
to be high-impact,
generous funding often follows.

R is used everywhere, and, of course, for many of these projects.  So
my naive question is:
I would have assumed that R should be well-funded because of this; are
there obstacles
in receiving funding from common sources?


Thanks,

Juliet

P.S. Thanks to everyone who has contributed to R and R-help!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] average columns of data frame corresponding to replicates

2010-09-07 Thread Juliet Hannah
Hi Group,

I have a data frame below. Within this data frame there are  samples
(columns) that are measured  more than once. Samples are indicated by
"idx". So "id1" is present in columns 1, 3, and 5. Not every id is
repeated. I would like to create a new data frame so that the repeated
 ids are averaged. For example, in the new data frame, columns 1, 3,
and 5 of the original will be replaced by 1 new column  that is the
mean of these three. Thanks for any suggestions.

Juliet



myData <- data.frame("sample1.id1" =rep(1,10),
"sample1.id2"=rep(2,10),
"sample2.id1" = rep(2,10),
"sample1.id3" = 1:10,
"sample3.id1" = rep(1,10),
"sample1.id4" = 1:10,
"sample2.id2" = rep(1,10))

repeat_ids <- c("id1","id2")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error possibly related to sweave, path, and spaces on windows

2010-08-21 Thread Juliet Hannah
Wow, it works! Thanks Erik. Your suggestion worked.

MikTek is indeed in the path, so I just modified the lines to:

  junk <- system(paste("pdflatex ",latexFiles[i1]),
 intern=TRUE)

On Sat, Aug 21, 2010 at 12:07 PM, Erik Iverson  wrote:
> On 08/21/2010 11:02 AM, Juliet Hannah wrote:
>>
>> I have downloaded a file that I don't know how to describe correctly.
>> It contains R code and Latex, and I should be able to reproduce an
>> analysis by running an R script in this folder.
>>
>> There is a line in the R script:
>>
>>   junk<- system(paste("/usr/texbin/pdflatex ",latexFiles[i1]),
>>                  intern=TRUE)
>>
>> that needs to be modified to run on my computer. I use WinEdt with
>> Miktek 2.6 Does anyone have any suggestions on how I can modify the
>> statement above to work on my computer. Or any other suggestions to
>> get it working.
>>
>> On my computer I see that pdflatex.exe is in:
>>
>> C:\Program Files\MiKTeX 2.6\miktex\bin
>>
>> So I made the following modification.
>>
>> junk<- system(paste("C:\\Program Files\\MiKTeX
>> 2.6\\miktex\\bin\\pdflatex ",latexFiles[i1]),
>>                  intern=TRUE)
>>
>> Maybe spaces are causing problems? Here is the error:
>>
>>> source("D:\\My Documents\\Coombes2\\Scripts\\runAll2.R")
>>
>> Writing to file buildRda.cellLinesFromPredictors.tex
>> Processing code chunks ...
>>  1 : echo term verbatim (label=options)
>>  2 : echo term verbatim (label=invokeMatchPredictors)
>>  3 : echo term verbatim (label=getLocation)
>>  4 : echo term verbatim (label=saveStuff)
>>  5 : echo term verbatim (label=sessionInfo)
>>
>> You can now run LaTeX on 'buildRda.cellLinesFromPredictors.tex'
>> Error in system(paste("C:\\Program Files\\MiKTeX
>> 2.6\\miktex\\bin\\pdflatex",  :
>>   C:\Program not found
>>
>> Thanks for any help.
>
> Yes, see the first paragraph of the Details section of ?system.
>
> You can also check your system path, I think MikTeX tries to
> alter it to include its "bin" directory.  Therefore, you might
> get by with just using "pdflatex" as your command in the system
> call.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error possibly related to sweave, path, and spaces on windows

2010-08-21 Thread Juliet Hannah
I have downloaded a file that I don't know how to describe correctly.
It contains R code and Latex, and I should be able to reproduce an
analysis by running an R script in this folder.

There is a line in the R script:

  junk <- system(paste("/usr/texbin/pdflatex ",latexFiles[i1]),
 intern=TRUE)

that needs to be modified to run on my computer. I use WinEdt with
Miktek 2.6 Does anyone have any suggestions on how I can modify the
statement above to work on my computer. Or any other suggestions to
get it working.

On my computer I see that pdflatex.exe is in:

C:\Program Files\MiKTeX 2.6\miktex\bin

So I made the following modification.

junk <- system(paste("C:\\Program Files\\MiKTeX
2.6\\miktex\\bin\\pdflatex ",latexFiles[i1]),
 intern=TRUE)

Maybe spaces are causing problems? Here is the error:

> source("D:\\My Documents\\Coombes2\\Scripts\\runAll2.R")
Writing to file buildRda.cellLinesFromPredictors.tex
Processing code chunks ...
 1 : echo term verbatim (label=options)
 2 : echo term verbatim (label=invokeMatchPredictors)
 3 : echo term verbatim (label=getLocation)
 4 : echo term verbatim (label=saveStuff)
 5 : echo term verbatim (label=sessionInfo)

You can now run LaTeX on 'buildRda.cellLinesFromPredictors.tex'
Error in system(paste("C:\\Program Files\\MiKTeX
2.6\\miktex\\bin\\pdflatex",  :
  C:\Program not found

Thanks for any help.

> sessionInfo()
R version 2.11.1 (2010-05-31)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading a text file, one line at a time

2010-08-18 Thread Juliet Hannah
Hi Jim,

I was trying to use your template without success. With the toy data
below, could
you explain how to use this template to change all "b"s to "z"s --
just as an exercise, reading
in 3 lines at a time. I need to use this strategy for a larger
problem, but I haven't
been able to get the basics working.

Thanks,

Juliet

myData <- structure(list(V1 = 1:11, V2 = structure(c(2L, 2L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"),
V3 = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L,
3L), .Label = c("a", "b", "c"), class = "factor"), V4 = structure(c(1L,
1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L), .Label = c("a",
"b"), class = "factor"), V5 = c(-0.499071939558026, 1.51341011554134,
1.93754671209923, 0.331061227463955, 0.280752001959284, 0.964635079229074,
0.624397908891502, -0.807600774484419, -1.76452730888732,
0.546080229326458, 12.3)), .Names = c("V1", "V2", "V3", "V4",
"V5"), class = "data.frame", row.names = c(NA, -11L))

On Sun, Aug 15, 2010 at 1:06 PM, jim holtman  wrote:
> For efficiency of processing, look at reading in several
> hundred/thousand lines at a time.  One line read/write will probably
> spend most of the time in the system calls to do the I/O and will take
> a long time.  So do something like this:
>
> con <- file('yourInputFile', 'r')
> outfile <- file('yourOutputFile', 'w')
> while (length(input <- readLines(con, n=1000) > 0){
>    for (i in 1:length(input)){
>        ..your one line at a time processing
>    }
>    writeLines(output, con=outfile)
> }
>
> On Sun, Aug 15, 2010 at 7:58 AM, Data Analytics Corp.
>  wrote:
>> Hi,
>>
>> I have an upcoming project that will involve a large text file.  I want to
>>
>>  1. read the file into R one line at a time
>>  2. do some string manipulations on the line
>>  3. write the line to another text file.
>>
>> I can handle the last two parts.  Scan and read.table seem to read the whole
>> file in at once.  Since this is a very large file (several hundred thousand
>> lines), this is not practical.  Hence the idea of reading one line at at
>> time.  The question is, can R read one line at a time?  If so, how?  Any
>> suggestions are appreciated.
>>
>> Thanks,
>>
>> Walt
>>
>> 
>>
>> Walter R. Paczkowski, Ph.D.
>> Data Analytics Corp.
>> 44 Hamilton Lane
>> Plainsboro, NJ 08536
>> 
>> (V) 609-936-8999
>> (F) 609-936-3733
>> w...@dataanalyticscorp.com
>> www.dataanalyticscorp.com
>>
>> _
>>
>>
>> --
>> 
>>
>> Walter R. Paczkowski, Ph.D.
>> Data Analytics Corp.
>> 44 Hamilton Lane
>> Plainsboro, NJ 08536
>> 
>> (V) 609-936-8999
>> (F) 609-936-3733
>> w...@dataanalyticscorp.com
>> www.dataanalyticscorp.com
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice xyplots plots with multiple lines per cell

2010-08-16 Thread Juliet Hannah
You may want to check out examples in lattice and ggplot2. Both of
these make plotting subsets much easier. I can't remember the lattice
syntax off the top of my head, but if you post some example data –
either by creating it or using dput – people will be able to help out
easier.  Here is some example data below that has the features you
were looking for (using ggplot2). One note. Learning both of the
packages takes some time, but if you make graphs often,  it will save
you a lot of time in the future.


time = c(1:100)
mymeans = 2*time + rnorm(n=100,mean=10,sd=20)
gender <- sample(c("m","f"),100,replace=TRUE)
grade <- sample(c("old","young"),100,replace=TRUE)
groups <- sample(c("control","intervention"),100,replace=TRUE)

mydata <- data.frame(y=mymeans,time=time,gender=gender,grade=grade)

library(ggplot2)
p <- ggplot(mydata,aes(x=time,y=mymeans))
p <- p + geom_point(aes(colour=groups)) +facet_grid(gender ~ grade)
p


On Fri, Aug 13, 2010 at 5:58 PM, Robin Jeffries  wrote:
> Hello,
>
> I need to plot the means of some outcome for two groups (control vs
> intervention) over time (discrete) on the same plot, for various subsets
> such as gender and grade level. What I have been doing is creating all
> possible subsets first, using the aggregate function to create the means
>  over time, then plotting the means over time (as a simple line plot with
> both control & intervention on one plot) for one subset. I then use par()
> and repeat this plot for each gender x grade level subset so they all appear
> on one page.
>
>
> This appears to me to be very similar to an xyplot, something like
>  mean(outcome) ~ gender + gradelevel. However, I can't figure out how I
> could get both control and intervention lines in the same plot.
>
> Any suggestions? What i'm doing now -works-, but just seems to be the long
> way around.
>
> -Robin
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] partial match of one column in data frame to another character vector

2010-08-10 Thread Juliet Hannah
Here is some data (dput output below)

> myData
  id  group
1   D599  A
2   002-0004  B
3 F01932  A
18  F16   B
19  F28   A
20   A94  B


and a vector of IDs (the full label).

> fullID
[1] "F16-284"  "ACC-A94-AB"   "ADAD599"  "002-0004BCC"
"CDCF01932.AB" "F28DDB"   "NOMATCH-EX"

For each id in myData, there could be a partial match in fullID. For
example D599 in myData matches  ADAD599. I would like to add a column
to myData that contains the corresponding fullID or NA if a match was
not found.

Thanks for your help.

Juliet

#
#Data
#

myData <- structure(list(id = structure(c(6L, 5L, 1L, 2L, 3L, 4L),
.Label = c(" F01932",
"   F16 ", "   F28 ", "  A94",
"   002-0004", " D599"), class = "factor"), group = structure(c(5L,
4L, 1L, 3L, 2L, 3L), .Label = c(" A",
"   A", "   B",
"  B", " A"), class = "factor")), .Names = c("id", "group"
), class = "data.frame", row.names = c("1", "2", "3", "18", "19 ",
"20  "))

fullID <- c("F16-284", "ACC-A94-AB", "ADAD599", "002-0004BCC", "CDCF01932.AB",
"F28DDB", "NOMATCH-EX")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] replace negative numbers by smallest positive value in matrix

2010-07-15 Thread Juliet Hannah
Thanks Jun and Phil!


On Thu, Jul 15, 2010 at 12:16 PM, Phil Spector
 wrote:
> Juliet -
>   Since you're operating on each column, apply() would
> be the appropriate function;
>
> mymat = apply(mymat,2,function(x){x[x<0] = min(x[x>0]);x})
>
>
>                                        - Phil Spector
>                                         Statistical Computing Facility
>                                         Department of Statistics
>                                         UC Berkeley
>                                         spec...@stat.berkeley.edu
>
>
>
> On Thu, 15 Jul 2010, Juliet Hannah wrote:
>
>> Hi Group,
>>
>> I have a matrix, and I would like to replace numbers less than 0 by
>> the smallest minimum number. Below is an
>> small matrix, and the loop I used. I would like to get suggestions on
>> the "R way" to do this.
>>
>> Thanks,
>>
>> Juliet
>>
>> # example data set
>>
>> mymat <- structure(c(-0.503183609420937, 0.179063475173256,
>> 0.130473004669938,
>> -1.80825226960127, -0.794910626384209, 1.03857280868547,
>> 0.362120146065799,
>> -2.01124119488992, -1.49083525457822, -0.356354715035589,
>> -0.306686279071398,
>> 0.0789120002882668, 1.50314029609087, -0.0177677865019544,
>> 1.31642572649823,
>> 1.78842032090131, -0.991393884836917, -0.868946528068323,
>> -0.325472385456867,
>> 0.11938394965), .Dim = c(5L, 4L))
>>
>>
>> # replacement of negative numbers
>>
>> for (mycol in 1:ncol(mymat)) {
>>  sort_dat <- sort(mymat[,mycol])
>>  min_pos_index <- min(which(sort_dat >0 ))
>>  min_pos_val <- sort_dat[min_pos_index]
>>  neg_nums <- which(mymat[,mycol]  <= 0)
>>  mymat[neg_nums,mycol] <- min_pos_val
>> }
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] replace negative numbers by smallest positive value in matrix

2010-07-15 Thread Juliet Hannah
Hi Group,

I have a matrix, and I would like to replace numbers less than 0 by
the smallest minimum number. Below is an
small matrix, and the loop I used. I would like to get suggestions on
the "R way" to do this.

Thanks,

Juliet

# example data set

mymat <- structure(c(-0.503183609420937, 0.179063475173256, 0.130473004669938,
-1.80825226960127, -0.794910626384209, 1.03857280868547, 0.362120146065799,
-2.01124119488992, -1.49083525457822, -0.356354715035589, -0.306686279071398,
0.0789120002882668, 1.50314029609087, -0.0177677865019544, 1.31642572649823,
1.78842032090131, -0.991393884836917, -0.868946528068323, -0.325472385456867,
0.11938394965), .Dim = c(5L, 4L))


# replacement of negative numbers

for (mycol in 1:ncol(mymat)) {
   sort_dat <- sort(mymat[,mycol])
   min_pos_index <- min(which(sort_dat >0 ))
   min_pos_val <- sort_dat[min_pos_index]
   neg_nums <- which(mymat[,mycol]  <= 0)
   mymat[neg_nums,mycol] <- min_pos_val
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] long to wide on larger data set

2010-07-14 Thread Juliet Hannah
Hi Matthew and Jim,

Thanks for all the suggestions as always. Matthew's post was very
informative in showing how things can be done much more efficiently
with data.table. I haven't had a chance to finish the reshaping
because my group was a in rush,
and someone else decided to do it in Perl. However, I did get a chance
to use the data.table package for the first time. In some preliminary
steps, I had to do some subsetting and recoding and this was superfast
with data.table. The tutorials were helpful in getting me up to speed.
Over the next few days
I plan to carry out the reshaping as a learning exercise so I'll be
ready next time. I'll post my results afterwards.

Thanks,

Juliet

On Mon, Jul 12, 2010 at 11:50 AM, Matthew Dowle  wrote:
> Juliet,
>
> I've been corrected off list. I did not read properly that you are on 64bit.
>
> The calculation should be :
>    53860858 * 4 * 8 /1024^3 = 1.6GB
> since pointers are 8 bytes on 64bit.
>
> Also, data.table is an add-on package so I should have included :
>
>   install.packages("data.table")
>   require(data.table)
>
> data.table is available on all platforms both 32bit and 64bit.
>
> Please forgive mistakes: 'someoone' should be 'someone', 'percieved' should
> be
> 'perceived' and 'testDate' should be 'testData' at the end.
>
> The rest still applies, and you might have a much easier time than I thought
> since you are on 64bit. I was working on the basis of squeezing into 32bit.
>
> Matthew
>
>
> "Matthew Dowle"  wrote in message
> news:i1faj2$lv...@dough.gmane.org...
>>
>> Hi Juliet,
>>
>> Thanks for the info.
>>
>> It is very slow because of the == in  testData[testData$V2==one_ind,]
>>
>> Why? Imagine someoone looks for 10 people in the phone directory. Would
>> they search the entire phone directory for the first person's phone
>> number, starting
>> on page 1, looking at every single name, even continuing to the end of the
>> book
>> after they had found them ?  Then would they start again from page 1 for
>> the 2nd
>> person, and then the 3rd, searching the entire phone directory from start
>> to finish
>> for each and every person ?  That code using == does that.  Some of us
>> call
>> that a 'vector scan' and is a common reason for R being percieved as slow.
>>
>> To do that more efficiently try this :
>>
>> testData = as.data.table(testData)
>> setkey(testData,V2)    # sorts data by V2
>> for (one_ind in mysamples) {
>>   one_sample <- testData[one_id,]
>>   reshape(one_sample)
>> }
>>
>> or just this :
>>
>> testData = as.data.table(testData)
>> setkey(testDate,V2)
>> testData[,reshape(.SD,...), by=V2]
>>
>> That should solve the vector scanning problem, and get you on to the
>> memory
>> problems which will need to be tackled. Since the 4 columns are character,
>> then
>> the object size should be roughly :
>>
>>    53860858 * 4 * 4 /1024^3 = 0.8GB
>>
>> That is more promising to work with in 32bit so there is hope. [ That
>> 0.8GB
>> ignores the (likely small) size of the unique strings in global string
>> hash (depending
>> on your data). ]
>>
>> Its likely that the as.data.table() fails with out of memory.  That is not
>> data.table
>> but unique. There is a change in unique.c in R 2.12 which makes unique
>> more
>> efficient and since factor calls unique, it may be necessary to use R
>> 2.12.
>>
>> If that still doesn't work, then there are several more tricks (and we
>> will need
>> further information), and there may be some tweaks needed to that code as
>> I
>> didn't test it,  but I think it should be possible in 32bit using R 2.12.
>>
>> Is it an option to just keep it in long format and use a data.table ?
>>
>>   testDate[, somecomplexrfunction(onecolumn, anothercolumn), by=list(V2) ]
>>
>> Why you you need to reshape from long to wide ?
>>
>> HTH,
>> Matthew
>>
>>
>>
>> "Juliet Hannah"  wrote in message
>> news:aanlktinyvgmrvdp0svc-fylgogn2ro0omnugqbxx_...@mail.gmail.com...
>> Hi Jim,
>>
>> Thanks for responding. Here is the info I should have included before.
>> I should be able to access 4 GB.
>>
>>> str(myData)
>> 'data.frame':   53860857 obs. of  4 variables:
>> $ V1: chr  "23" "26" "200047" "200050" ...
>> $ V2: chr  "cv0001" "cv0001"

Re: [R] long to wide on larger data set

2010-07-12 Thread Juliet Hannah
Hi Jim,

Thanks for responding. Here is the info I should have included before.
I should be able to access 4 GB.

> str(myData)
'data.frame':   53860857 obs. of  4 variables:
 $ V1: chr  "23" "26" "200047" "200050" ...
 $ V2: chr  "cv0001" "cv0001" "cv0001" "cv0001" ...
 $ V3: chr  "A" "A" "A" "B" ...
 $ V4: chr  "B" "B" "A" "B" ...
> sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

On Mon, Jul 12, 2010 at 7:54 AM, jim holtman  wrote:
> What is the configuration you are running on (OS, memory, etc.)?  What
> does your object consist of?  Is it numeric, factors, etc.?  Provide a
> 'str' of it.  If it is numeric, then the size of the object is
> probably about 1.8GB.  Doing the long to wide you will probably need
> at least that much additional memory to hold the copy, if not more.
> This would be impossible on a 32-bit version of R.
>
> On Mon, Jul 12, 2010 at 1:25 AM, Juliet Hannah  
> wrote:
>> I have a data set that has 4 columns and 53860858 rows. I was able to
>> read this into R with:
>>
>> cc <- rep("character",4)
>> myData <- 
>> read.table("myData.csv",header=FALSE,skip=1,colClasses=cc,nrow=53860858,sep=",")
>>
>>
>> I need to reshape this data from long to wide. On a small data set the
>> following lines work. But on the real data set, it didn't finish even
>> when I took a sample of two (rows in new data). I didn't receive an
>> error. I just stopped it because it was taking too long. Any
>> suggestions for improvements? Thanks.
>>
>> # start example
>> # i have commented out the write.table statement below
>>
>> testData <- read.table(textConnection("rs853,cv0084,A,A
>> rs86,cv0084,C,B
>>  rs883,cv0084,E,F
>>  rs853,cv0085,G,H
>>  rs86,cv0085,I,J
>>  rs883,cv0085,K,L"),header=FALSE,sep=",")
>>  closeAllConnections()
>>
>> mysamples <- unique(testData$V2)
>>
>> for (one_ind in mysamples) {
>>   one_sample <- testData[testData$V2==one_ind,]
>>   mywide <- reshape(one_sample, timevar = "V1", idvar =
>> "V2",direction = "wide")
>> #   write.table(mywide,file
>> ="newdata.txt",append=TRUE,row.names=FALSE,col.names=FALSE,quote=FALSE)
>> }
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] long to wide on larger data set

2010-07-11 Thread Juliet Hannah
I have a data set that has 4 columns and 53860858 rows. I was able to
read this into R with:

cc <- rep("character",4)
myData <- 
read.table("myData.csv",header=FALSE,skip=1,colClasses=cc,nrow=53860858,sep=",")


I need to reshape this data from long to wide. On a small data set the
following lines work. But on the real data set, it didn't finish even
when I took a sample of two (rows in new data). I didn't receive an
error. I just stopped it because it was taking too long. Any
suggestions for improvements? Thanks.

# start example
# i have commented out the write.table statement below

testData <- read.table(textConnection("rs853,cv0084,A,A
rs86,cv0084,C,B
 rs883,cv0084,E,F
 rs853,cv0085,G,H
 rs86,cv0085,I,J
 rs883,cv0085,K,L"),header=FALSE,sep=",")
 closeAllConnections()

mysamples <- unique(testData$V2)

for (one_ind in mysamples) {
   one_sample <- testData[testData$V2==one_ind,]
   mywide <- reshape(one_sample, timevar = "V1", idvar =
"V2",direction = "wide")
#   write.table(mywide,file
="newdata.txt",append=TRUE,row.names=FALSE,col.names=FALSE,quote=FALSE)
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression - glm() - example in Dalgaard's book ISwR

2010-07-03 Thread Juliet Hannah
You may find both of Alan Agresti's books on categorcial data analysis
useful. Try googling both books and then
search the word "grouped" within each book. Agresti refers to the
difference you describe as
grouped versus ungrouped data. The likelihoods differ and all
summaries based on the likelihood will
also differ.

On Fri, Jul 2, 2010 at 11:33 PM, Paulo Barata  wrote:
>
> Dear R-list members,
>
> I would like to pose a question about the use and results
> of the glm() function for logistic regression calculations.
>
> The question is based on an example provided on p. 229
> in P. Dalgaard, Introductory Statistics with R, 2nd. edition,
> Springer, 2008. By means of this example, I was trying to
> practice the different ways of entering data in glm().
>
> In his book, Dalgaard provides data from a case-based study
> about hypertension summarized in the form of a table. He shows
> two ways of entering the response (dependent) variable data
> in glm(): (1) as a matrix of successes/failures (diseased/
> healthy); (2) as the proportion of people diseased for each
> combination of independent variable's categories.
>
> I tried to enter the response variable in glm() in a third
> way: by reconstructing, from the table, the original data
> in a case-based format, that is, a data frame in which
> each row shows the data for one person. In this situation,
> the response variable would be coded as a numeric 0/1 vector,
> 0=failure, 1=success, and so it would be entered in glm() as
> a numeric 0/1 vector.
>
> The program below presents the calculations for each of the
> three ways of entering data - the first and second methods
> were just copied from Dalgaard's book.
>
> The three methods produce the same results with regard
> to the estimated coefficients, when the output is seen
> with five decimals (although regression 3 seems to have
> produced slightly different std.errors).
>
> My main question is: Why are the residual deviance, its
> degrees of freedom and the AIC produced by regression 3
> completely different when compared to those produced by
> regressions 1 and 2 (which seem to produce equal results)?
> It seems that the degrees of freedom in regressions 1
> and 2 are based on the size (number of rows) of table d
> (see the output of the program below), but this table is
> just a way of summarizing the data. The degrees of
> freedom in regressions 1 and 2 are not based on the
> actual number of cases (people) examined, which is n=433.
>
> I understand that no matter the way of entering the data
> in glm(), we are always analyzing the same data, which
> are those presented in table format on Dalgaard's page
> 229 (these data are in data.frame d in the program below).
> So I understand that the three ways of entering data
> in glm() should produce the same results.
>
> Secondarily, why are the std.errors in regression 3 slightly
> different from those calculated in regressions 1 and 2?
>
> I am using R version 2.11.1 on Windows XP.
>
> Thank you very much.
>
> Paulo Barata
>
> ##== begin =
>
> ## data in: P. Dalgaard, Introductory Statistics with R,
> ## 2nd. edition, Springer, 2008
> ## logistic regression - example in Dalgaard's Section 13.2,
> ## page 229
>
> rm(list=ls())
>
> ## data provided on Dalgaard's page 229:
> no.yes <- c("No","Yes")
> smoking <- gl(2,1,8,no.yes)
> obesity <- gl(2,2,8,no.yes)
> snoring <- gl(2,4,8,no.yes)
> n.tot <- c(60,17,8,2,187,85,51,23)
> n.hyp <- c(5,2,1,0,35,13,15,8)
>
> d <- data.frame(smoking,obesity,snoring,n.tot,n.hyp)
> ## d is the data to be analyzed, in table format
> ## d is the first table on Dalgaard's page 229
> ## n.tot = total number of cases
> ## n.hyp = people with hypertension
> d
>
> ## regression 1: Dalgaard's page 230
> ## response variable entered in glm() as a matrix of
> ## successes/failures
> hyp.tbl <- cbind(n.hyp,n.tot-n.hyp)
> regression1 <- glm(hyp.tbl~smoking+obesity+snoring,
>                   family=binomial("logit"))
>
> ## regression 2: Dalgaard's page 230
> ## response variable entered in glm() as proportions
> prop.hyp <- n.hyp/n.tot
> regression2 <- glm(prop.hyp~smoking+obesity+snoring,
>                   weights=n.tot,family=binomial("logit"))
>
> ## regression 3 (well below): data entered in glm()
> ## by means of 'reconstructed' variables
> ## variables with names beginning with 'r' are
> ## 'reconstructed' from data in data.frame d.
> ## The objective is to reconstruct the original
> ## data from which the table on Dalgaard's page 229
> ## has been produced
>
> rsmoking <- c(rep('No',d[1,4]),rep('Yes',d[2,4]),
>              rep('No',d[3,4]),rep('Yes',d[4,4]),
>              rep('No',d[5,4]),rep('Yes',d[6,4]),
>              rep('No',d[7,4]),rep('Yes',d[8,4]))
> rsmoking <- factor(rsmoking)
> length(rsmoking)  # just a check
>
> robesity <- c(rep('No', d[1,4]),rep('No', d[2,4]),
>              rep('Yes',d[3,4]),rep('Yes',d[4,4]),
>              rep('No', d[5,4]),rep('No', d[6,4]),
>

Re: [R] reg: R genetics problem

2010-06-23 Thread Juliet Hannah
I've used this package before, and it always gives me the message:

NOTE: THIS PACKAGE IS NOW OBSOLETE.

So I stopped using it. I just tried installing it and it gave me some
new errors (below). Maybe you should also post your sessionInfo().

> library("genetics")
Loading required package: combinat

Attaching package: 'combinat'

The following object(s) are masked from 'package:utils':

combn

Loading required package: gdata
gdata: Unable to locate valid perl interpreter
gdata:
gdata: read.xls() will be unable to read Excel XLS and XLSX files
unless the 'perl=' argument is used to specify the location of a valid
perl intrpreter.
gdata:
gdata: (To avoid display of this message in the future, please ensure
perl is installed and available on the executable search path.)
gdata: Unable to load perl libaries needed by read.xls()
gdata: to support 'XLX' (Excel 97-2004) files.

gdata: Unable to load perl libaries needed by read.xls()
gdata: to support 'XLSX' (Excel 2007+) files.

gdata: Run the function 'installXLSXsupport()'
gdata: to automatically download and install the perl
gdata: libaries needed to support Excel XLS and XLSX formats.

Attaching package: 'gdata'

The following object(s) are masked from 'package:utils':

object.size

Loading required package: gtools
Loading required package: MASS
Loading required package: mvtnorm

NOTE: THIS PACKAGE IS NOW OBSOLETE.

  The R-Genetics project has developed an set of enhanced genetics
  packages to replace 'genetics'. Please visit the project homepage
  at http://rgenetics.org for informtion.


Attaching package: 'genetics'

The following object(s) are masked from 'package:base':

%in%, as.factor, order

> sessionInfo()
R version 2.11.0 (2010-04-22)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] genetics_1.3.4 mvtnorm_0.9-9  MASS_7.3-5 gtools_2.6.2
gdata_2.8.0combinat_0.0-7


On Wed, Jun 23, 2010 at 2:49 AM, A Padmavathi Devi padma
 wrote:
> Dear Sir,
> We need to use genetics packages at our end. We have installed R packages. 
> When i try to choose genetics package and install, it is giving an error. 
> Please see the attachment for the error. How can i resolve it and make it 
> work?
> Thanking you
> A Padmavathi Devi
> Technical Officer
> Centre for Cellular and Molecular Biology
> Ph.No:27192776
>
> "The person addressed in the email is the sole authorized recipient.
> Should you receive it in error, immediately notify the sender of the
> error and delete the e-mail. Any unauthorized dissemination or copying
> of this e-mail (or any attachment to this e-mail) or the wrongful
> disclosure of the information herein contained is prohibited.
> Also note that this form of communication is not secure, it can be
> intercepted, and may not necessarily be free of errors and viruses
> in spite of reasonable efforts to secure this medium."
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to extract the 1st field from a vector of strings

2010-05-31 Thread Juliet Hannah
What is the meaning of "\\1" here? Thanks.

desc <- c("hsa-let-7a MIMAT062 Homo sapiens let-7a","hsa-let-7a*
MIMAT0004481 Homo sapiens let-7a*","hsa-let-7a-2* MIMAT0010195 Homo
sapiens let-7a-2*")

I'm missing something:

> gsub(" MIMA.*", "\\1", desc)
[1] "hsa-let-7a""hsa-let-7a*"   "hsa-let-7a-2*"
> gsub(" MIMA.*", "\\2", desc)
[1] "hsa-let-7a""hsa-let-7a*"   "hsa-let-7a-2*"
> gsub(" MIMA.*", "\\3", desc)
[1] "hsa-let-7a""hsa-let-7a*"   "hsa-let-7a-2*"

On Thu, May 27, 2010 at 10:58 AM, Henrique Dallazuanna  wrote:
> Try this:
>
>  gsub(" MIMA.*", "\\1", desc)
>
> On Thu, May 27, 2010 at 11:37 AM,  wrote:
>
>> I have the following vector of strings (shown only the first 3 elements)
>>
>> > desc[1:3]
>> [1] "hsa-let-7a MIMAT062 Homo sapiens let-7a"
>> [2] "hsa-let-7a* MIMAT0004481 Homo sapiens let-7a*"
>> [3] "hsa-let-7a-2* MIMAT0010195 Homo sapiens let-7a-2*"
>> > is.vector(desc)
>> [1] TRUE
>> > A <- unlist(strsplit(desc[1:3], "  "))
>> > A
>> [1] "hsa-let-7a  MIMAT062 Homo sapiens let-7a"
>> [2] "hsa-let-7a*  MIMAT0004481 Homo sapiens let-7a*"
>> [3] "hsa-let-7a-2*  MIMAT0010195 Homo sapiens let-7a-2*"
>> > as.vector(A)
>> [1] "hsa-let-7a  MIMAT062 Homo sapiens let-7a"
>> [2] "hsa-let-7a*  MIMAT0004481 Homo sapiens let-7a*"
>> [3] "hsa-let-7a-2*  MIMAT0010195 Homo sapiens let-7a-2*"
>> >
>> I would like to extract only the first field (of variable length). That is
>> I need a vector containing
>> "hsa-let-7a "
>> "hsa-let-7a*"
>> "hsa-let-7a-2*"
>>
>> The operator [[]][] works only on the single vector element. I would like
>> to extract the 1st field
>> with one single instruction rather than a loop as traditional programming
>> languages request.
>>
>> Thank you in advance for you help.
>> Maura
>>
>>
>>
>> tutti i telefonini TIM!
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] suggestions/improvements for recoding strategy

2010-05-17 Thread Juliet Hannah
I am recoding some data. Many values that should be 1.5 are recorded
as 1-2. Some example data and my solution is below. I am curious about
better approaches or any other suggestions. Thanks!

# example input data

myData <- read.table(textConnection("id, v1, v2, v3
a,1,2,3
b,1-2,,3-4
c,,3,4"),header=TRUE,sep=",")
closeAllConnections()

# the first column is IDs so remove that

numdat <- myData[,-1]

# function to change dashes: 1-2 to 1.5

myrecode <- function(mycol)
{
   newcol <- mycol
   newcol <- gsub("1-2","1.5",newcol)
   newcol <- gsub("2-3","2.5",newcol)
   newcol <- gsub("3-4","3.5",newcol)
   newcol <- as.numeric(newcol)

}

newData <- data.frame(do.call(cbind,lapply(numdat,myrecode)))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Questions about ggplot2

2010-05-16 Thread Juliet Hannah
I started with the summarized data, and there are different ways to do
this. For this example, let there be four columns and a corresponding
sum of 1s.

library("ggplot2")
mydf  <- data.frame(colname = c("A","B","C","D"),mycolsum=c(1:4))
p <- ggplot(mydf,aes(x=colname,y=mycolsum))
p <- p + geom_bar(stat = "identity")

# Here is one way a legend would be created, and how to remove it.

library("ggplot2")
mydf  <- data.frame(colname = c("A","B","C","D"),mycolsum=c(1:4))
p <- ggplot(mydf,aes(fill=colname, x=colname,y=mycolsum))
p <- p + geom_bar(stat = "identity")
p + opts(legend.position = "none")


On Thu, May 13, 2010 at 11:33 AM, Christopher David Desjardins
 wrote:
> Hi I have two questions about using ggplot2.
>
> First, I have multiple columns of data that I would like to combine into one
> histogram where each column of data would correspond to one bar in the
> histogram. Each column has 0 or 1s and I want my bars in the histogram to
> correspond to the sum of the 1s in each column. Does that make sense?
>
> Second, is there a way to completely turn off the legend?
>
> Thanks!
> Chris
>
> PS - Please cc me on the email as I'm a digest subscriber.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] doBy and Hmisc on R version 2.11.0

2010-04-23 Thread Juliet Hannah
I should have mentioned that I also tried:

> install.packages("Hmisc")
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
  package ‘Hmisc’ is not available



On Fri, Apr 23, 2010 at 3:15 PM, David Winsemius  wrote:
>
> On Apr 23, 2010, at 3:09 PM, Juliet Hannah wrote:
>
>> I installed R 2.11.0, and I don't think I can load the doBy package
>> now. Any suggestions?
>>
>>> library("doBy")
>>
>> Loading required package: survival
>> Loading required package: splines
>> Error in loadNamespace(i[[1L]], c(lib.loc, .libPaths())) :
>>  there is no package called 'Hmisc'
>
> Install Hmisc so that it is there when doBy needs it.
>
>> Error: package/namespace load failed for 'doBy'
>>
>>
>>> sessionInfo()
>>
>> R version 2.11.0 (2010-04-22)
>> i386-pc-mingw32
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>> States.1252
>> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] splines   stats     graphics  grDevices utils     datasets
>> methods   base
>>
>> other attached packages:
>> [1] survival_2.35-8
>>
>> loaded via a namespace (and not attached):
>> [1] grid_2.11.0        lattice_0.18-5     Matrix_0.999375-38
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] doBy and Hmisc on R version 2.11.0

2010-04-23 Thread Juliet Hannah
I installed R 2.11.0, and I don't think I can load the doBy package
now. Any suggestions?

> library("doBy")
Loading required package: survival
Loading required package: splines
Error in loadNamespace(i[[1L]], c(lib.loc, .libPaths())) :
  there is no package called 'Hmisc'
Error: package/namespace load failed for 'doBy'


> sessionInfo()
R version 2.11.0 (2010-04-22)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets
methods   base

other attached packages:
[1] survival_2.35-8

loaded via a namespace (and not attached):
[1] grid_2.11.0lattice_0.18-5 Matrix_0.999375-38

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] uninstalling and installing on linux

2010-04-23 Thread Juliet Hannah
>>
>> This has not worked for me, meaning I can still use R, so instead I
>> removed the directory
>>
>> rm -fR R-2.10.1
>>
>> Is one method preferable to another. And what am I doing incorrectly
>> with "make uninstall"?
>
> Don't know, probably not many are using it and it may be fairly untested.
>

If not many are using it, is "make uninstall" not the conventional way
to uninstall? What is
the most commonly used way? I was trying to follow  the R installation
manual, but
maybe I haven't read enough. Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] uninstalling and installing on linux

2010-04-23 Thread Juliet Hannah
Hi List,

I have a question about uninstalling and installing R on linux, which
I am new to.

> sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base


First, I am trying to uninstall my old version of R with "make uninstall".


 [...@head4 ~/R-2.10.1]$ make uninstall
make[1]: Entering directory `/home/merlin/jh1/R-2.10.1/po'
make[1]: Leaving directory `/home/merlin/jh1/R-2.10.1/po'
make[1]: Entering directory `/home/merlin/jh1/R-2.10.1/tests'
make[1]: Nothing to be done for `uninstall'.
make[1]: Leaving directory `/home/merlin/jh1/R-2.10.1/tests'
make[1]: Entering directory `/home/merlin/jh1/R-2.10.1/src'
make[2]: Entering directory `/home/merlin/jh1/R-2.10.1/src/library'
uninstalling packages ...
/bin/sh: line 0: cd: /usr/local/lib64/R/library: No such file or directory
  subdir /usr/local/lib64/R/library not removed
make[2]: Leaving directory `/home/merlin/jh1/R-2.10.1/src/library'
make[2]: Entering directory `/home/merlin/jh1/R-2.10.1/src/modules'
make[2]: Leaving directory `/home/merlin/jh1/R-2.10.1/src/modules'
make[2]: Entering directory `/home/merlin/jh1/R-2.10.1/src/main'
make[2]: Leaving directory `/home/merlin/jh1/R-2.10.1/src/main'
make[2]: Entering directory `/home/merlin/jh1/R-2.10.1/src/unix'
make[2]: Leaving directory `/home/merlin/jh1/R-2.10.1/src/unix'
make[2]: Entering directory `/home/merlin/jh1/R-2.10.1/src/nmath'
make[2]: Nothing to be done for `uninstall'.
make[2]: Leaving directory `/home/merlin/jh1/R-2.10.1/src/nmath'
make[2]: Entering directory `/home/merlin/jh1/R-2.10.1/src/appl'
make[2]: Nothing to be done for `uninstall'.
make[2]: Leaving directory `/home/merlin/jh1/R-2.10.1/src/appl'
make[2]: Entering directory `/home/merlin/jh1/R-2.10.1/src/extra'
make[2]: Nothing to be done for `uninstall'.
make[2]: Leaving directory `/home/merlin/jh1/R-2.10.1/src/extra'
make[2]: Entering directory `/home/merlin/jh1/R-2.10.1/src/include'
make[3]: Entering directory `/home/merlin/jh1/R-2.10.1/src/include/R_ext'
make[3]: Leaving directory `/home/merlin/jh1/R-2.10.1/src/include/R_ext'
make[2]: Leaving directory `/home/merlin/jh1/R-2.10.1/src/include'
make[2]: Entering directory `/home/merlin/jh1/R-2.10.1/src/scripts'
make[2]: Leaving directory `/home/merlin/jh1/R-2.10.1/src/scripts'
make[1]: Leaving directory `/home/merlin/jh1/R-2.10.1/src'
make[1]: Entering directory `/home/merlin/jh1/R-2.10.1/share'
uninstalling share ...
make[1]: Leaving directory `/home/merlin/jh1/R-2.10.1/share'
make[1]: Entering directory `/home/merlin/jh1/R-2.10.1/etc'
uninstalling etc ...
make[1]: Leaving directory `/home/merlin/jh1/R-2.10.1/etc'
make[1]: Entering directory `/home/merlin/jh1/R-2.10.1/doc'
uninstalling doc ...
make[1]: Leaving directory `/home/merlin/jh1/R-2.10.1/doc'
make[1]: Entering directory `/home/merlin/jh1/R-2.10.1/tools'
make[1]: Nothing to be done for `uninstall'.
make[1]: Leaving directory `/home/merlin/jh1/R-2.10.1/tools'
make[1]: Entering directory `/home/merlin/jh1/R-2.10.1/m4'
make[1]: Nothing to be done for `uninstall'.
make[1]: Leaving directory `/home/merlin/jh1/R-2.10.1/m4'

This has not worked for me, meaning I can still use R, so instead I
removed the directory

rm -fR R-2.10.1

Is one method preferable to another. And what am I doing incorrectly
with "make uninstall"?


My second question is about installation. I just installed the new
version for Windows,and it took a few minutes.

On linux, I

downloaded the R tar.gz file
tar xzvf R-2.9.1.tar.gz (old command)
./configure
 make
make check

This process takes about 2 hours. I'm curious about the time it takes,
and wondering if I am doing any incorrectly?

Thanks,

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with multtest (rawp2adjp)

2010-03-06 Thread Juliet Hannah
Some code to cut and paste would be helpful. The following may help out.

library(multtest)

# create some p-values
p <- runif(100)
p <- sort(p)
p_adj <- mt.rawp2adjp(p, proc="BH", alpha = 0.05)

> str(p_adj)
List of 4
 $ adjp   : num [1:100, 1:2] 0.0142 0.0174 0.0254 0.0258 0.0736 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:2] "rawp" "BH"
 $ index  : int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
 $ h0.ABH : NULL
 $ h0.TSBH: NULL

> p_results <- p_adj$adjp


> is.data.frame(p_results)
[1] FALSE

p_results <- as.data.frame(p_results)

p_results[p_results$BH < 0.8,]
p_subset <- subset(p_results,BH < 0.8)


On Tue, Mar 2, 2010 at 11:04 PM, Sahil Seth  wrote:
> Hello R experts,
> I am trying to analyze this dataset and am stuck on this problem for quite
> some time now.
> I am using mt.rawp2adjp.
> the output that came out was a matrix with two colums since I had asked it
> to calculate the adjusted p values using one method.
> so it has the two columns as: rawp BH
>
> I combined these using cbind with my actual dataframe.
> checked using head all was fine.
>
> thereafter I am trying to extract the rows where the values in BH are below
> a particular value(alpha say 0.05):
> by the command:
> partMult <- subset(multData,BH < 0.05)
> this gives a error saying that
> the operator < is not valid for factors.
> Initally it seemed that the column BH is a factor, but typeof(BH) revealed
> that it is a integer variable.
>
> I also tried converting it into doube, and it did convert but then the
> values just changed:
> 0.0008 became 34
> .0009 become say 28 and so on.
>
> It would be great to have your inputs on the issue.
>
> I am currently exploring the mt.reject function.
>
> thanks
> Sahil
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Three most useful R package

2010-03-03 Thread Juliet Hannah
I use rms, lme4, ggplot2 frequently (also lattice and MASS).

On Tue, Mar 2, 2010 at 3:13 PM, Ralf B  wrote:
> Hi R-fans,
>
> I would like put out a question to all R users on this list and hope
> it will create some feedback and discussion.
>
> 1) What are your 3 most useful R package? and
>
> 2) What R package do you still miss and why do you think it would make
> a useful addition?
>
> Pulling answers together for these questions will serve as a guide for
> new users and help people who just want to get a hint where to look
> first. Happy replying!
>
> Best,
> Ralf
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do: Correlation with "blocks" (or - "repeated measures" ?!) ?

2010-02-28 Thread Juliet Hannah
I didn't follow your question completely. But do a search for
intraclass correlation with nlme or lmer and see if those results
relate to the question you are asking. If so, I would
suggest following up on the mixed model list. I know you
wanted to avoid mixed models, but if I have understood
your question, that is the way to use all of your data
to estimate the parameters you seek.

On Thu, Feb 25, 2010 at 10:57 AM, Tal Galili  wrote:
> Hello dear R help group,
>
> I have the following setup to analyse:
> We have about 150 subjects, and for each subject we performed a pair of
> tests (under different conditions) 18 times.
> The 18 different conditions of the test are complementary, in such a way so
> that if we where to average over the tests (for each subject), we would get
> no correlation between the tests (between subjects).
> What we wish to know is the correlation (and P value) between the tests, in
> within subjects, but over all the subjects.
>
> The way I did this by now was to perform the correlation for each subject,
> and then look at the distribution of the correlations received so to see if
> it's mean is different then 0.
> But I suspect there might be a better way for answering the same question
> (someone said to me something about "geographical correlation", but a
> shallow search didn't help).
>
> p.s: I understand there might be a place here to do some sort of mixed
> model, but I would prefer to present a "correlation", and am not sure how to
> extract such an output from a mixed model.
>
> Also, here is a short dummy code to give an idea of what I am talking about:
>
> attach(longley)
> N <- length(Unemployed)
> block <- c(
> rep( "a", N),
> rep( "b", N),
>  rep( "c", N)
> )
>  Unemployed.3 <- c(Unemployed + rnorm(1),
> Unemployed + rnorm(1),
> Unemployed + rnorm(1))
>
> GNP.deflator.3 <- c(GNP.deflator + rnorm(1),
> GNP.deflator + rnorm(1),
> GNP.deflator + rnorm(1))
>
> cor(Unemployed, GNP.deflator)
> cor(Unemployed.3, GNP.deflator.3)
> cor(Unemployed.3[block == "a"], GNP.deflator.3[block == "a"])
> cor(Unemployed.3[block == "b"], GNP.deflator.3[block == "b"])
> cor(Unemployed.3[block == "c"], GNP.deflator.3[block == "c"])
>
> (I would like to somehow combine the last three correlations...)
>
>
>
> Any ideas will be welcomed.
>
> Best,
> Tal
>
>
>
>
>
>
> Contact
> Details:---
> Contact me: tal.gal...@gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> --
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with installing "genetics" package

2010-02-22 Thread Juliet Hannah
I just installed it, and it worked fine.

> sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] genetics_1.3.4 mvtnorm_0.9-9  MASS_7.3-4 gtools_2.6.1
gdata_2.7.1combinat_0.0-7

loaded via a namespace (and not attached):
[1] tools_2.10.1

On Fri, Feb 19, 2010 at 10:21 PM,   wrote:
>
> I have tried to install the "genetics" package and am getting and error.
> The log is below. For information, I am using R 2.10.1 on the Windows
> XP operating system. The mirror site I'm using is Michigan Technological
> University.
>
> Your help in this matter will be greatly appreciated.
>
> Murray M Cooper, PhD
> Richland Statistics
> 9800 North 24th St
> Richland, MI, USA 49083
>
> 
> Log from attempted install:
>
>> utils:::menuInstallPkgs()
> also installing the dependencies ‘combinat’, ‘gdata’, ‘gtools’, ‘mvtnorm’
>
> trying URL 'http://cran.mtu.edu/bin/windows/contrib/2.10/combinat_0.0-7.zip'
> Content type 'application/zip' length 25623 bytes (25 Kb)
> opened URL
> downloaded 25 Kb
>
> trying URL 'http://cran.mtu.edu/bin/windows/contrib/2.10/gdata_2.7.1.zip'
> Content type 'application/zip' length 817994 bytes (798 Kb)
> opened URL
> downloaded 798 Kb
>
> trying URL 'http://cran.mtu.edu/bin/windows/contrib/2.10/gtools_2.6.1.zip'
> Content type 'application/zip' length 91801 bytes (89 Kb)
> opened URL
> downloaded 89 Kb
>
> trying URL 'http://cran.mtu.edu/bin/windows/contrib/2.10/mvtnorm_0.9-9.zip'
> Content type 'application/zip' length 202656 bytes (197 Kb)
> opened URL
> downloaded 197 Kb
>
> trying URL 'http://cran.mtu.edu/bin/windows/contrib/2.10/genetics_1.3.4.zip'
> Content type 'application/zip' length 303230 bytes (296 Kb)
> opened URL
> downloaded 296 Kb
>
> package 'combinat' successfully unpacked and MD5 sums checked
> package 'gdata' successfully unpacked and MD5 sums checked
> Error in normalizePath(path) :
>  path[1]="C:\Program Files\R\R-2.10.1\library/gdata": The system cannot find 
> the file specified
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unordered Factors For ggplot?

2010-02-21 Thread Juliet Hannah
It would be easier with some example data. Make sure the data is represented by
factors and check the levels and relevel if needed. Something like:

df$day <- factor(df$day, levels = c("30", "29", "20))

Also search the ggplot2 mailing list for factor and order. I think
similar questions
are asked often (although I may not be interpreting your question correctly).



On Wed, Feb 17, 2010 at 1:50 PM, Ryan Garner
 wrote:
>
> I have data that comes into R already ordered. When I use ggplot, it orders
> them which I don't want. How do I fix this without changing
> options("contrast")?
>
> The data I have is number of days:
> 30
> 29
> ...
> 20
> 19
> ...
> 10
> 9
> ...
> 1
>
> When I plot with ggplot, it orders them by the first number only. So 3 ends
> up coming before 29.
> --
> View this message in context: 
> http://n4.nabble.com/Unordered-Factors-For-ggplot-tp1559146p1559146.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hierarchical data sets: which software to use?

2010-02-04 Thread Juliet Hannah
Check out the book

Linear Mixed Models: A Practical Guide Using Statistical Software  by
Brady West.

It sets up analyses, similar to ones you described, in SPSS, R, and
others as well.

In general, I think it is good to know a couple of different packages,
especially
if you plan on doing a lot of data analysis and data manipulation.



On Sun, Jan 31, 2010 at 11:24 PM, Anton du Toit  wrote:
> Dear R-helpers,
>
> I’m writing for advice on whether I should use R or a different package or
> language. I’ve looked through the R-help archives, some manuals, and some
> other sites as well, and I haven’t done too well finding relevant info,
> hence my question here.
>
> I’m working with hierarchical data (in SPSS lingo). That is, for each case
> (person) I read in three types of (medical) record:
>
> 1. demographic data: name, age, sex, address, etc
>
> 2. ‘admissions’ data: this generally repeats, so I will have 20 or so
> variables relating to their first hospital admission, then the same 20 again
> for their second admission, and so on
>
> 3. ‘collections’ data, about 100 variables containing the results of a
> battery of standard tests. These are administered at intervals and so this
> is repeating data as well.
>
> The number of repetitions varies between cases, so in its one case per line
> format the data is non-rectangular.
>
> At present I have shoehorned all of this into SPSS, with each case on one
> line. My test database has 2,500 variables and 1,500 cases (or persons), and
> in SPSS’s *.SAV format is ~4MB. The one I finally work with will be larger
> again, though likely within one order of magnitude. Down the track, funding
> permitting, I hope to be working with tens of thousands of cases.
>
> I am wondering if I should keep using SPSS, or try something else.
>
> The types of analysis I’ll typically will have to do will involve comparing
> measurements at different times, e.g. before/ after treatment. I’ll also
> need to compare groups of people, e.g. treatment / no treatment. Regression
> and factor analyses will doubtless come into it at some point too.
>
> So:
>
> 1. should I use R or try something else?
>
> 2. can anyone advise me on using R with the type of data I’ve described?
>
>
> Many thanks,
>
> Anton du Toit
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] convert data frame of values into correlation matrix

2010-01-30 Thread Juliet Hannah
Hi Group,

Consider a data frame like this:

mylabel1 <- rep(c("A","B","C"),each=3)
mylabel2 <- rep(c("A","B","C"),3)
corrs <- c(1,.8,.7,.8,1,.7,.7,.7,1)
 myData <- data.frame(mylabel1,mylabel2,corrs)

myData

  mylabel1 mylabel2 corrs
1AA   1.0
2AB   0.8
3AC   0.7
4BA   0.8
5BB   1.0
6BC   0.7
7CA   0.7
8CB   0.7
9CC   1.0

I would like to find a general way to get this matrix from the above dataframe.

corrmat <- matrix(corrs,nrow=3,byrow=TRUE)
row.names(corrmat) <- c("A","B","C")
colnames(corrmat) <- c("A","B","C")
corrmat
A   B   C
A 1.0 0.8 0.7
B 0.8 1.0 0.7
C 0.7 0.7 1.0

The solution I have is the one above where I rearrange the data so
that I can just use matrix() on one
of the columns. I am looking for a solution in which I don't have to do this.

Thanks,

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simulation of binary data

2010-01-23 Thread Juliet Hannah
Check out the help page of the lrm function in the rms library. To
show how lrm is used,
the examples simulate data for logistic regression. This may give you
some ideas.

On Wed, Jan 20, 2010 at 10:41 AM, omar kairan  wrote:
> Hi,
>
> could someone help me with dilemma on the simulation of logistic
> regressiondata with multicollinearity effect and high leverage point..
>
> Thank you
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Eigenvectors and values in R and SAS

2010-01-15 Thread Juliet Hannah
Here is an example that may be helpful.

A <- matrix(c(-3,5,4,-2),nrow=2,byrow=TRUE)
eigs <- eigen(A)

eigs

$values
[1] -7  2

$vectors
   [,1]   [,2]
[1,] -0.7808688 -0.7071068
[2,]  0.6246950 -0.7071068

The eigenvectors may be scaled differently because they are not unique
(or have a different sign), but Ax = lambda x, for an eigenvalue
lambda.

#Ax
A %*% eigs$vectors[,1]

  [,1]
[1,]  5.466082
[2,] -4.372865

# λx
> -7 * eigs$vectors[,1,drop=FALSE]
  [,1]
[1,]  5.466082
[2,] -4.372865


The eigenvectors for proc iml are scaled similarly, but have different signs.

proc iml;
A = {-3 5, 4 -2};
print A;
eigval = eigval(A);
evec = eigvec(A);
print eigval;
print evec;

A_x = A*evec[,1];
lambda_x = eigval[1,1]*evec[,1];
print A_x;
print lambda_x;

quit;

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data manipulation/subsetting and relation matrix

2009-12-07 Thread Juliet Hannah
Hi List,

Here is some example data.

myDat <- read.table(textConnection("group id
1 101
1 201
1 301
2 401
2 501
2 601
3 701
3 801
3 901"),header=TRUE)
closeAllConnections()

corr_mat <-read.table(textConnection("1 1   .5  0   0   0   0   0   0   0
2 .5   1  0   0   0   0   0   0   0
3 00  1.0   0   0   0   0   0   0
4 00  0   1   .5  .5  0   0   0
5 00  0   .5  1.5  0   0   0
6 00  0   .5  .5   1 00   0
7 00  0   00   0  1   0  0
8 0   0   0   00   0   0  1  .5
9 0   0   0   0   00   0  .5 1"),header=FALSE)
closeAllConnections()

corr_mat <- corr_mat[,-1]
colnames(corr_mat) <- myDat$id
rownames(corr_mat) <- myDat$id

I need to subset this data such that observations within a group are not
related, which is indicated by a 0 in corr_mat.

For example, within group 1, 101 and 201 are related, so one of these
has to be selected, say
101. 301 is not related to 101 or 201, so the final set for group 1
consists of 101 and 301. There will always be at least 2 members in
each group. I need to carry this task on all groups.

One possible final data set looks like:

  group  id
1 1 101
3 1 301
4 2 401
7 3 701
8 3 801

Any suggestions? Thanks!

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Partial correlations and p-values

2009-12-05 Thread Juliet Hannah
Your R code looks correct.

Because this is a straightforward calculation, I would be surprised if there
were any differences with SPSS. It may be worthwhile to check
if SPSS  gives partial correlations or semipartial correlations. For example,
if you take the correlation between

py <- resid(lm(y ~ z1 + z2,data=mydat2))

and

x

where mydat2 has missing values removed, you get 0.47.

On Tue, Dec 1, 2009 at 8:24 PM, dadrivr  wrote:
>
> I am trying to calculate a partial correlation and p-values.  Unfortunately,
> the results in R are different than what SPSS gives.
>
> Here is an example in R (calculating the partial correlation of x and y,
> controlling for z1 and z2):
>
> x <- c(1,20,14,30,9,4,8)
> y <- c(5,6,7,9,NA,10,6)
> z1 <- c(13,8,16,14,26,13,20)
> z2 <- c(12,NA,2,5,8,16,13)
> fmx <- lm(x ~ z1 + z2, na.action = na.exclude)
> fmy <- lm(y ~ z1 + z2, na.action = na.exclude)
> yres <- resid(fmy)
> xres <- resid(fmx)
> cor(xres, yres, use = "p")
> ct <- cor.test(xres, yres)
> ct$estimate
> ct$p.value
>
> R give me:
> r = .65, p = .23
>
> However, SPSS calculates:
> r = .46, p = .70
>
> I think something may be different with R's handling of missing data, as
> when I replace the NA's with values, R and SPSS give the same r-values,
> albeit different p-values still.  I am doing pairwise case exclusion in both
> R and SPSS.  Any ideas why I'm getting different values?  Is something wrong
> with my formula in R?  Any help would be greatly appreciated.  Thanks!
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling dataframe

2009-11-28 Thread Juliet Hannah
Here are some options that may help you out. First,
let's put the data in a format that can be cut-and-pasted
into R.

myData <- read.table(textConnection("var1 var2 var3
1 111
2 312
3 813
4 614
51015
6 221
7 422
8 623
9 824
10   1025"),header=TRUE,row.names=1)
closeAllConnections()

or

use dput

myData <- structure(list(var1 = c(1L, 3L, 8L, 6L, 10L, 2L, 4L, 6L, 8L,
10L), var2 = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), var3 = c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), .Names = c("var1", "var2",
"var3"), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10"))


#Select data where v2=1

select_v2 <- myData[myData$var2==1,]

# sample two rows of select_v2

sampled_v2 <- select_v2[sample(1:nrow(select_v2),2),]

# select rows of var3 not equal to 1

select_v3 <- myData[myData$v3 !=1,]

# ?rbind may also come in useful.

2009/11/25 Ronaldo Reis Júnior :
> Hi,
>
> I have a table like that:
>
>> datatest
>   var1 var2 var3
> 1     1    1    1
> 2     3    1    2
> 3     8    1    3
> 4     6    1    4
> 5    10    1    5
> 6     2    2    1
> 7     4    2    2
> 8     6    2    3
> 9     8    2    4
> 10   10    2    5
>
> I need to create another table based on that with the rules:
>
> take a random sample by var2==1 (2 sample rows for example):
>
>   var1 var2 var3
> 1     1    1    1
> 4     6    1    4
>
> in this random sample a get the 1 and 4 value on the var3, now I need to
> complete the table with var1==2 with the lines that var3 are not select on
> var2==1
>
> The resulting table is:
>   var1 var2 var3
> 1     1    1    1
> 4     6    1    4
> 7     4    2    2
> 8     6    2    3
> 10   10    2    5
>
> the value 1 and 4 on var3 is not present in the var2==2.
>
> I try several options but without success. take a random value is easy, but I
> cant select the others value excluding the random selected values.
>
> Any help?
>
> Thanks
> Ronaldo
>
>
> --
> 17ª lei - Seu orientador quer que você se torne famoso,
>          de modo que ele possa, finalmente, se tornar famoso.
>
>      --Herman, I. P. 2007. Following the law. NATURE, Vol 445, p. 228.
> --
>> Prof. Ronaldo Reis Júnior
> |  .''`. UNIMONTES/DBG/Lab. Ecologia Comportamental e Computacional
> | : :'  : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia
> | `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil
> |   `- Fone: (38) 3229-8192 | ronaldo.r...@unimontes.br | chrys...@gmail.com
> | http://www.ppgcb.unimontes.br/lecc | ICQ#: 5692561 | LinuxUser#: 205366
> --
> Favor NÃO ENVIAR arquivos do Word ou Powerpoint
> Prefira enviar em PDF, Texto, OpenOffice (ODF), HTML, or RTF.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help for graphical representation

2009-11-21 Thread Juliet Hannah
Check out examples in the lattice package and ggplot2 package.
For example let's say you plot points and confidence intervals. These packages
will then allow you to plot these values by group and by combinations of groups.
Look up conditioning and faceting in these packages.

On Wed, Nov 18, 2009 at 7:50 AM, Sunita22  wrote:
>
> Hello
>
> I am unable to find a graph for my data, My data contains following columns:
>
> 1st column: Posts (GM, Secretary, AM, Office Boy)
> 2nd Column: Dept (Finance, HR, ...)
> 3rd column: Tasks (Open the door, Fix an appointment, Fill the register,
> etc.) depending on the post
> 4th column: Average Time required to do the task
>
> So the sample data would look like
> Posts            Dept        Task                       Average time
> Office Boy      HR           Open the door          00:00:09
> Secretary       Finance    Fix an appointment    00.00.30
>                 .          .                        .
>
> I am trying to represent this data in Graphical format, I tried graphs like
> Mosaic plot, etc. But it does not represent the data correctly. My aim is to
> check the "amount of time and its variability for groups of tasks"
>
> Can someone suggest me few graphs for this kind of data?
>
> Thank you in advance
> Regards
> Sunita
> --
> View this message in context: 
> http://old.nabble.com/Need-help-for-graphical-representation-tp26407207p26407207.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] when vectorising does not work: silent function fail?

2009-11-14 Thread Juliet Hannah
>
> Also, you probably get less data copying by using a for() or while() loop
> than by using apply() in this context.


Why may there be less data copying with "for" and "while" compared to apply?


>
> Finally, the overhead of formula parsing and model matrix construction
> repeated thousands of times probably dominates this computation; if it isn't
> just a one-off it would probably be worth a lower-level implementation.
>

Does "lower-level implementation" mean code this outside of R.

Thanks!

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] QQ plotting of various distributions...

2009-09-27 Thread Juliet Hannah
I think it's helpful to show the sampling variability in a QQ plot
under repeated
sampling. An example is given
in Venables, Ripley pg 86. The variance is higher at the tails. Even when the
distributions are the same, the QQ plot does not have to resemble a straight
line because of sampling. I don't think you can think of any one of these as the
"correct" plot.

Also, if the two
data sets have an equal number of points, the empirical qq plot is
simply a plot of
one sorted data set against the other. (Kundu, Statistical Computing, pg 42).


On Sun, Sep 27, 2009 at 9:06 AM, Duncan Murdoch  wrote:
> Eric Thompson wrote:
>>
>> The supposed example of a Q-Q plot is most certainly not how to make a
>> Q-Q plot. I don't even know where to start
>>
>> First off, the two "Q:s in the title of the plot stand for "quantile",
>> not "random". The "answer" supplied simply plots two sorted samples of
>> a distribution against each other. While this may resemble the general
>> shape of a QQ plot, that is where the similarities end.
>>
>
> The empirical quantiles of a sample are simply the sorted values.  You can
> plot empirical quantiles of one sample versus some version of quantiles from
> a distribution (what qqnorm does) or versus empirical quantiles of another
> sample (what Sunil did).  The randomness in his demonstration did two
> things: it generated some data, and it showed the variability of the plot
> under repeated sampling.
>>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in make.names when trying to read.table in if statement

2009-09-25 Thread Juliet Hannah
Does this work for you?

data_list <- list()
filepattern="modrate*"
all_files <- list.files(pattern=filepattern)
data_list <- lapply(all_files, read.table,header=TRUE,sep=",")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare a group of line slopes

2009-09-18 Thread Juliet Hannah
The test that a slope differs by group is a test that the
variable*group interaction equals
zero (overall test). Maybe searching post-hoc comparisons in
regression will give you some
leads.

On Tue, Sep 15, 2009 at 10:57 AM, Jun Shen  wrote:
> Hi, all,
>
> I am thinking to compare a group of slopes from regression lines to see if
> they are different overall, and then make specific comparisons between
> groups. How can I achieve that in R?  I searched the archives and there are
> only discussions about comparing two lines a time.  Thanks.
>
> A sample data set is like the following. I would like to compare the
> regression slopes between the five groups. Thanks for your help.
>
> data<-data.frame(x=rep(1:10,5),y=rnorm(50),group=rep(1:5,each=10))
>
> Jun Shen
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scan and read.table

2009-09-09 Thread Juliet Hannah
Do you run into problems if you use something like:

cc <- rep("numeric",9)
mydata <- read.table("yourdata",header=TRUE,colClasses=cc,skip=1,nrows=numRows)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using an array of strings with strsplit, issue when including a space in split criteria

2009-09-07 Thread Juliet Hannah
I get a different result:

txt <- c("sales to 23 August 2008 published 29 August","sales to 6
September 2008 published 11 September")
strsplit(txt, 'published ', fixed=TRUE)
[[1]]
[1] "sales to 23 August 2008 " "29 August"

[[2]]
[1] "sales to 6 September 2008 " "11 September"

> sessionInfo()
R version 2.9.0 (2009-04-17)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base



On Mon, Sep 7, 2009 at 11:40 AM, Tony Breyal wrote:
> Dear all,
>
> I'm having a problem understanding why a split does not occur with in
> the 2nd use of the function strsplit below:
>
> # text strings
>> txt <- c("sales to 23 August 2008 published 29 August",
> + "sales to 6 September 2008 published 11 September")
>
> # first use
>> strsplit(txt, 'published', fixed=TRUE)
> [[1]]
> [1] "sales to 23 August 2008 " " 29 August"
>
> [[2]]
> [1] "sales to 6 September 2008 " " 11 September"
>
> # second use, but with a space ' ' in the split
>> strsplit(txt, 'published ', fixed=TRUE)
> [[1]]
> [1] "sales to 23 August 2008 " "29 August"
>
> [[2]]
> [1] "sales to 6 September 2008 published 11 September"
>
> Thank you kindly for any help in advance.
> Tony
>
> O/S: Win Vista Ultimate
>> sessionInfo()
> R version 2.9.2 (2009-08-24)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
> 1252;LC_MONETARY=English_United Kingdom.
> 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] RODBC_1.3-0
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] permutation test - query

2009-09-03 Thread Juliet Hannah
You may find the multtest package helpful. It implements methods from
Westfall and Young (Resampling based multiple testing).



On Mon, Aug 31, 2009 at 5:37 AM, Yonatan
Nissenbaum wrote:
> Hi,
>
> My query is regarding permutation test and reshuffling of genotype/phenotype
> data
> I have been using the haplo.stats package of R. for haplotype analysis and I
> would like to perform an analysis which I'm requesting your advice.
> I have a data set of individuals genotyped for 12 SNP and a dichotomous
> phenotype.
> At first, I have tested each of those SNP independently in order to bypass
> multiple testing issues and
> I have found that 5 out of those 12 SNP had a p value <0.05.
> The analysis that I would like to run is a permutation simulation (lets say
> 10,000 times) that will estimate the probability to receive
> significant results by chance for 5 SNP or more.
> I'm aware that permutation test are available in R however, I could not find
> the way of writing the right algorithm
> that will count at each run only the cases by which at least 5 SNP (out of
> 12) were significant.
> Any assistance on that matter will be most appreciated.
>
> Many thanks,
>
> Jonathan
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Politomic Regression in R

2009-09-01 Thread Juliet Hannah
Check out Chapter 7 of Laura Thompson's R Companion to Agresti (you can find it
online).

It will show you how to fit proportional odds models (polr in MASS,
and lrm in the Design library) and multinomial
regression models.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Within factor & random factor

2009-08-29 Thread Juliet Hannah
Let's say that location defined a group, and observations may
be more similar in a group. You could account for this similarity
with the following model.

model1 <-lme(X~CorP,random=~1|location,data=mydata,method="ML")

This fits a random intercept model grouped by location. This would
assume that the slope of the regression of X on CorP is the same by
group, but the group means differ.

"CorP"  defines the fixed part of the model.

 random=  ~1 defines the random part (in this case just an intercept).

The last part specifies the estimation method.

 I would recommend reading through Pinheiro and Bates. Brady West's
Linear Mixed Models also has nice examples. It is difficult (for me)
to assess your analysis. For example, I can see how location and
subject would be random if you had repeated measures on subjects
within location. These concepts take some time to understand.  I would
recommend working on this problem with someone experienced in this. In
your studying, if you come across specific questions about nlme or
lme4, there is a mixed model mailing list that is very helpful. One
more note, your response takes values between 0 and 1, so you would
have to make sure the residuals are behaving ok (read up on
diagnostics).

Best,

Juliet


2009/8/26 細田弘吉 :
> Hi,
> I am quite new to R and trying to analyze the following data.  I have 28
> controls and 25 patients.  I measured X values of 4 different locations
> (A,B,C,D) in the brain image of each subject.  And X ranges from 0 to 1.
> I think "control or patient" is a between subject factor and location is
> a within subject factor.  So,
>
> controls: 28
> patients: 25 (unbalanced data set)
> respone measure: X values (ranging 0 to 1)
> fixed factor: control vs. patient (between subject factor)
> random factor: location (level: A,B,C,D ;no order) (within subject factor)
> random factor: subjectID 1-53
>
> My data looks like this;
>
> CorPX   locationsubjectID
> control 0.708   A   1
> control 0.648   A   2
> patient 0.638   C   3
> control 0.547   D   4
> patient 0.632   B   5
> control 0.723   C   6
> ...
>
> I want to know
> (a) if there is a significant difference between controls and patients
> in X values.
> (b) where (A,B,C,D?) the difference is between controls and patients in
> X values.  (There may be an interaction)
>
> I constructed linear mixed model with lme as followings;
>
> (1) model1 <- lme(X ~ CorP*location, random= ~ 1| subjectID, mydata)
>
> (2) model2 <- lme(X ~ CorP*location, random= ~ location| subjectID, mydata)
>
> I am not familiar with lme syntax.  I'm just wondering which formula
> [(1) or (2)] is appropriate for my model to know answers of (a) and (b)
> questions.  Or may be both of the formulas are wrong.
>
> I would appreciate it very much if somebody could help me.
>
> Sincerely,
>
> Kohkichi Hosoda
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How can I do a generic specification in multiple logistic regression

2009-08-25 Thread Juliet Hannah
Is multinom the function you are looking for?

library(nnet)
library(MASS)

?multinom indicates that this
fits multinomial log-linear models. If you are looking for multiple
logistic regression
you may want to read up on glm or lrm from the Design package.

Could you elaborate on what you mean by a generic coefficient?

Best,

Juliet

On Fri, Aug 14, 2009 at 3:46 AM, Benjamin Geckle wrote:
> I use the function multinom to estimate a multiple logistic regression. but
> I need one coefficient as generic. How can I do this? Or is it possible to
> do this with an another function?
>
>
>
> As example  this is what I get:
>
>
> Call:
>
> multinom(formula = choice ~ time + costs, data = DATA)
>
>
>
> Coefficients:
>
>  (Intercept)        time             costs
>
> 1    4.79            -0.28          -3.97
>
> 2   -3.30             0.053         0.59
>
> 3    2.10            -0.048        -0.65
>
>
>
> But I would like to get the costs parameters as generic attribute, so that I
> have the following result:
>
> Coefficients:
>
>  (Intercept)        time           costs
>
> 1    4.79            -0.3            -1.8
>
> 2   -3.36             0.4            -1.8
>
> 3    2.10            -0.5            -1.8
>
>
>
>
> Thank you for you help.
>
> Best Regards
>
> Benjamin
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange package installation error

2009-08-21 Thread Juliet Hannah
Hi Janet,

Were you able to install the package? I just installed it without
problems. I don't
think there should be any issues installing it. If it has not worked
yet, make sure your
R is updated, and if it is, maybe reinstall it.

Best,

Juliet

On Mon, Aug 17, 2009 at 8:43 PM, Janet
Rosenbaum wrote:
> Hi.  I'm trying to install a new package.  I'm a relatively long-time
> (though not advanced) R user and have never seen this error before.
> For the first example, I tried a few different CRAN mirrors.  In the
> second example, the file does exist; I downloaded it from the CRAN
> website for the package and pasted in the name exactly a few different
> times to make sure it was right.
>
>> install.packages("DiagnosisMed")
> Warning message:
> package ‘DiagnosisMed’ is not available
>> install.packages("/Users/janet/Desktop/Camino-downloads/DiagnosisMed_0.2.2.1.tar.gz")
> Warning message:
> package ‘/Users/janet/Desktop/Camino-downloads/DiagnosisMed_0.2.2.1.tar.gz’
> is not available
>
> Permissions:
> [Macintosh:~/Desktop/Camino downloads] janet% ls -l 
> DiagnosisMed_0.2.2.1.tar.gz
> -rwxr-xr-x@ 1 janet  janet  42731 Aug 17 17:31 DiagnosisMed_0.2.2.1.tar.gz
>
> Any ideas?
>
> Thanks for your help.
>
> Janet
> --
> Janet Rosenbaum, Ph.D., A.M.
> Postdoctoral Fellow, Johns Hopkins Bloomberg School of Public Health
> ja...@post.harvard.edu
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xyplot and subscripts

2009-08-13 Thread Juliet Hannah
I'm not sure how to do this in lattice, but here is an option
with ggplot2.

library(ggplot2)
set.seed(123)

# Make sure the data has a variable that indicates
# which group is red and which one is black
DF <- data.frame(x = rnorm(10), y = rnorm(10), gr = rep(1:5,
2),endpoint = c(rep("Red_Group",5),rep("Black_Group",5)))
p <- ggplot(DF,aes(x=x,y=y,group=gr))
p <- p+geom_line()
p <- p + geom_point(aes(colour=endpoint))
# Manually specify that you want the colors to be black and red
p <- p + scale_colour_manual("endpoint", c("Black_Group" = "black",
"Red_Group" = "red"))
p

Best,

Juliet

On Thu, Aug 6, 2009 at 10:27 AM, Kito Palaxou wrote:
>
> Hi all,
>
> I have a data frame like this:
>
> DF <- data.frame(x = rnorm(10), y = rnorm(10), gr = rep(1:5, 2))
>
> and I make the following xy-plot:
>
> library(lattice)
> xyplot(y ~ x, data = DF, groups = gr, type = "b", col = 1)
>
>
> Is it possible that the two points that belong to the same group specified by 
> DF$gr to have a different color -- that is for each pair of points connected 
> by the line, I want one to be black and the other red. I tried:
>
> xyplot(y ~ x, data = DF, groups = gr, type = "b", col = 1:2)
>
>
> but it doen't work. Is there any solution for this??
>
> Thanks a lot!
>
> Kito
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting points in random but different colors based on condition

2009-08-05 Thread Juliet Hannah
Maybe this is helpful.

Install ggplot2.

#Create a small example
x <- seq(1:20)
y <- (2*x) + rnorm(length(x),0,1)
id <- rep(1:5,each=4)
dat <- data.frame(x,y,id)

library(ggplot2)
p <- ggplot(dat,aes(x=x,y=y,colour=factor(id)))
p <- p + geom_point()
p

If this is not the correct structure, maybe including an example will
help. For example,
you can output your data or a subset with dput.

dput(dat)
structure(list(x = 1:20, y = c(3.52519254181276, 5.24044281644567,
7.47776852203483, 7.4451075539517, 9.85463703204831, 12.533885478775,
14.2364347537978, 15.6031588004515, 17.2400819084948, 20.2053499115504,
21.4994848677570, 23.6354345765166, 24.5523452698087, 27.3185637999515,
28.5812163905758, 32.6805764070542, 37.0894956595204, 35.0009332656281,
39.6077996102053, 41.0996392969496), id = c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L)), .Names = c("x",
"y", "id"), row.names = c(NA, -20L), class = "data.frame")

Best,

Juliet

On Wed, Aug 5, 2009 at 7:00 PM, per freem wrote:
> hi all,
>
> suppose I have a file with several columns, and I want to plot all points
> that meet a certain condition (i.e. all points in columns A and B that have
> a value greater than such and such in column C of the data) in a new but
> random color.  That is, I want all these points to be colored differently
> but I dont care what color.  The only concern is that the points will be
> colored as differently from each other as possible.
>
> The specific example I have is a file with three columns, X, Y and ID.  I
> want to plot all rows from X, Y (i.e. all points) that have the same value
> (say 1) in their ID column as one color, all points from X, Y that have the
> same ID column value (say 2) as a different color, etc.  I dont know ahead
> of time how many values the ID column will have so I can't write a separate
> plot statement for each of these sets of X, Y rows that have the same ID
> value.
>
> Is there a way to express this in R?
>
> thank you.
> Is there a way to do this?
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] suggestion for paired t-tests

2009-07-25 Thread Juliet Hannah
Hi Jack,

Maybe this helps.

#  make some data

set.seed(123)
condition <- factor(rep(c("a","b"), each = 5))
score <- rnorm(10);
lg <- data.frame(condition, score)

# Carry out commands

a <- subset(lg,condition=="a")["score"]
b <- subset(lg,condition=="b")["score"]

t.test(a,b,paired=TRUE)

#Error in `[.data.frame`(y, yok) : undefined columns selected

# a and b are still data frames

#So this works. We must refer to score, which is being compared.

t.test(a$score,b$score,paired=TRUE)


a=a[,1]
b=b[,1]

# This now works because a and b are vectors
# so we don't need $ to access score

t.test(a,b, paired=TRUE)

Regards,

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulate residuals with different properties for a linear model (regression)

2009-07-20 Thread Juliet Hannah
Here are a couple of examples.

# residuals not normal
n <- 100;
x = seq(n)
y = 10 + 10 *x + 20 * rchisq(n,df=2)
non_normal_lm = lm(y~x)

#non-constant variance
n <- 100;
x = seq(n)
y = 100 + 3 * x + rnorm(n,0,3) * x;
het_var_lm = lm(y~x)

#For each of these try:
plot(non_normal_lm)
plot(het_var_lm)

#or specify which one you want
plot(non_normal_lm,which=1)

Best,

Juliet

On Mon, Jul 20, 2009 at 2:16 PM, Friedericksen
Hope wrote:
> Hey guys,
>
> for educational purposes I wonder if it is possible to simulate
> different data sets (or specifically residuals) for a linear regression.
> I would like to show my students residuals with different means,
> variances and distributions (normal, but also not normal) in the plots
> created with the plot command for a lm-object. In addition it would be
> nice to simulate although influencal values (high cooks distance and
> leverage)
>
> lm.results <- lm(y~x,data)
> plot(lm.results)
>
> Is there an easy way to do this? Or can this be done at all (and if yes,
> any hints?:-)
>
> Thanks and Greetings!
> Friedericksen
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] c-index validation from Design library

2009-07-17 Thread Juliet Hannah
Hi Group,

I have a question about obtaining the bias-corrected c-index using
validate from the Design library.

As an example, consider the example from help page:

library(Design)
?validate.lrm

n <- 1000
age<- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol<- rnorm(n, 200, 25)
sex<- factor(sample(c('female','male'), n,TRUE))


L <- .4*(sex=='male') + .045*(age-50) + (log(cholesterol -
10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))

y <- ifelse(runif(n) < plogis(L), 1, 0)

f <- lrm(y ~ sex*rcs(cholesterol)+pol(age,2)+blood.pressure, x=TRUE, y=TRUE)

validate(f, B=100)

The output does not include c, but it does include Dxy. The bias
corrected Dxy = 0.280.

Is it correct for me to say that the bias corrected c-index is:
0.280/2 + 0.5 = 0.64?

Also, I have seen this described as the c-index, which is a
generalization of the c-statistic. Is there
a difference? I thought both of these quantities refer to the area
under the ROC.

Thanks!

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] strategy to iterate over repeated measures/longitudinal data

2009-07-15 Thread Juliet Hannah
Hi Group,

Create some example data.

set.seed(1)
wide_data <- data.frame(
id=c(1:10),
predictor1 = sample(c("a","b"),10,replace=TRUE),
predictor2 = sample(c("a","b"),10,replace=TRUE),
   predictor3 = sample(c("a","b"),10,replace=TRUE),
measurement1=rnorm(10),
measurement2=rnorm(10))

head(wide_data)

  id predictor1 predictor2 predictor3 measurement1 measurement2
1  1  a  a  b  -0.04493361  -0.05612874
2  2  a  a  a  -0.01619026  -0.15579551
3  3  b  b  b   0.94383621  -1.47075238
4  4  b  a  a   0.82122120  -0.47815006
5  5  a  b  a   0.59390132   0.41794156
6  6  b  a  a   0.91897737   1.35867955

The measurements are repeated measures, and I am looking at one
predictor at a time. In the actual problem, there are around 400,000
predictors (again, one at a time).

Now, I want to use multiple measurements (the responses) to run a
regression of measurements on a predictor. So I will convert this data
from wide to long format.

I want to iterate through each predictor. So one (inefficient) way is
shown below.

For each predictor:
1. create a long data set using the predictor and all measurements
(using make.univ function from  multilevel package)
2. run model, extract the coefficient of interest
3. go to next predictor

The end result is a vector of 400,000 coefficients.

I'm sure this can be improved upon. I will be running this on a unix
cluster with 16G.
In the wide format, there are 2000 rows (individuals). With 4 repeated
measures,  it seems converting everything up front could be
problematic. Also, I'm not sure how to iterate through that (maybe
putting it in a list).  Any suggestions?

Thanks for your help.

Juliet


Here is the inefficient, working code.

library(multilevel)
library(lme4)

#Same data as above
set.seed(1)
wide_data <- data.frame(
id=c(1:10),
predictor1 = sample(c("a","b"),10,replace=TRUE),
predictor2 = sample(c("a","b"),10,replace=TRUE),
   predictor3 = sample(c("a","b"),10,replace=TRUE),
measurement1=rnorm(10),
measurement2=rnorm(10))


#vector of names to iterate over
predictor_names <- colnames(wide_data)[2:4]
#vector to store coefficients
mycoefs <- rep(-1,length(predictor_names))
names(mycoefs) <- predictor_names
for (predictor in predictor_names)
{
   long_data <-  make.univ( data.frame(wide_data$id,wide_data[,predictor]),
data.frame(
 wide_data$measurement1,
 wide_data$measurement2
)
  )
   names(long_data) <- c('id', 'predictor', 'time','measurement')
   myfit <- lmer(measurement ~ predictor + (1|id),data=long_data)
   mycoefs[predictor] <- my...@fixef[2]
}

mycoefs

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] correct way to subset a vector

2009-07-09 Thread Juliet Hannah
Hi,

#make example data
dat <- data.frame(matrix(rnorm(15),ncol=5))
colnames(dat) <- c("ab","cd","ef","gh","ij")

If I want to get a subset of the data for the middle 3 columns, and I
know the names of the start column and the end column, I can do this:

mysub <- subset(dat,select=c(cd:gh))

If I wanted to do this just on the column names, without subsetting
the data, how could I do this?

mynames <- colnames(dat);

#mynames
#[1] "ab" "cd" "ef" "gh" "ij"

Is there an easy way to create the vector c("cd","ef","gh") as I did
above using something similar to cd:gh?

Thanks,

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] skip the error to continue the logistic regression in a loop

2009-07-04 Thread Juliet Hannah
Here are two things to try.

First check the data.  There may be a factor that does not have
variation in the sample. For example,
if you had a predictor such as 'present'/'absent', in the current
sample, all of them
may be 'present'.

Second, you can put a 'try' statement in your function.


try( myglm <- glm(Response ~ Predictor, family=binomial, data=myData) )

See ?try.


On Thu, Jul 2, 2009 at 2:48 PM, Suyan Tian wrote:
> Hi, everyone:
>
> I am running logistic regression on a bunch of variables using apply
> command. But an error occurs, the whole process stops. I am wondering if
> anyone knows how to skip this error  and to continue the regression for the
> rest of variable.
>
> What I did is that first confine a function to the logistic regression, then
> use
>
> apply(data, 2, reg.fun)
>
> Then I got an error which is
>
> [1] "The result of logistic regression:"
> Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :
>  contrasts can be applied only to factors with 2 or more levels
>
> Can anyone help me out? Thanks a lot.
>
>
> Suyan
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Repost][Off Topic] Pointers needed for breakthrough in statistics

2009-06-19 Thread Juliet Hannah
You may find the following two books useful:

Lehmann, Reminiscences of a Statistician (Springer).

David Salsburg, The lady testing tea.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] learning about panel functions in lattice

2009-06-14 Thread Juliet Hannah
Hi All,

I am trying to understand panel functions. Let's use this example.


library(lattice)
time<-c(rep(1:10,5))
y <-time+rnorm(50,5,2)
group<-c(rep('A',30),rep('B',20))
subject<-c(rep('a',10),rep('b',10),rep('c',10),rep('d',10),rep('e',10))
myData <-data.frame(subject,group,time,y)
head(myData)

Plot 1

xyplot(y ~ time | group, data = myData, groups = subject,
   panel = panel.superpose,
   panel.groups = function(...)
   panel.loess(...)
   )


Plot 2

xyplot(y ~ time | group, data = myData, groups = subject,
   panel = function(...) {
   panel = panel.superpose(...)
   panel.groups = function(...)
   panel.loess(...)
   })

Why does Plot 2 not produce the loess curves? Why are points not
plotted on Plot 1?

Plot 3

xyplot(y ~ time | group, data = myData, groups = subject,
panel = function(...) {
  panel.superpose(...)
  panel.superpose(panel.groups = panel.loess, ...)
})

What is the effect of writing panel.superpose twice? What is it about
this call that makes the data points
appear on the graph?


Thanks,

Juliet

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to set a filter during reading tables

2009-05-31 Thread Juliet Hannah
There are several things you can tell read.table to make it faster.

First, as mentioned, setting colClasses helps. I think telling read.table how
many rows and columns there are also helps.

When this was not sufficient,  I've had to do the data processing
using Python, Perl, or awk.

If that had not been convenient I would have tried the sqldf solution that was
mentioned.

That covers all the options I'm familiar with. I'm also curious about other ways
to selectively read in rows in R. Let me know what ends up working.



On Sun, May 31, 2009 at 2:17 PM,   wrote:
> Since there are many rows, using read.table we spent too much on reading
> in rows that we do not want. We are wondering if there is a way to read
> only rows that we are interested in. Thanks,
>
> -james
>> I think you can use readLines(n=1) in loop to skip unwanted rows.
>>
>> On Mon, Jun 1, 2009 at 12:56 AM,   wrote:
>>> Thanks, Juliet.
>>> It works for filtering columns.
>>> I am also wondering if there is a way to filter rows.
>>> Thanks again.
>>> -james
>>>
 One can use colClasses to set which columns get read in. For the
 columns you don't
 want you can set those to NULL. For example,

 cc <- c("NULL",rep("numeric",9))

 myData <-
 read.table("myFile.txt",header=TRUE,colClasses=cc,nrow=numRows).


 On Wed, May 27, 2009 at 12:27 PM,   wrote:
> We are reading big tables, such as,
>
> Chemicals <-
> read.table('ftp://ftp.bls.gov/pub/time.series/wp/wp.data.7.Chemicals',header
> = TRUE, sep = '\t', as.is =T)
>
> I was wondering if it is possible to set a filter during loading so
> that
> we just load what we want not the whole table each time. Thanks,
>
> -james
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >