Re: [R] histogram with bars colored according to a vector of values

2013-07-26 Thread Jim Lemon

On 07/26/2013 12:13 PM, john d wrote:

Dear all,

Let's say I have the following data.frame:

dat-data.frame(x=rnorm(100), y=rnorm(100,2))

and I plot a histogram of variable x, somethink like:
hist(dat$x, breaks=-5:5)

Now, I'd like to color each bar according to the mean of the cases
according to y. For instance, the color of the bar between -2 and -1 should
reflect the mean of variable y for the corresponding cases. Any suggestions?

John


Hi John,
Try this:

dat$xcut-cut(dat$x,breaks=-5:5)
library(plotrix)
barcol-color.scale(by(dat$y,dat$xcut,mean),extremes=c(red,blue))
hist(dat$x,breaks=-5:5,col=barcol)

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with apply (lapply or sapply not sure)

2013-07-26 Thread PIKAL Petr
Hi

I would just use for cycle. Something like

lll - vector(list, length(InputFiles))
for (i in InputFiles) {

everything what you want to do for each file
you can store it in a list
lll[[i]] - whatever you want to store there

}

After populating list you can do anything with it.

Regards
Petr

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Robert Lynch
 Sent: Friday, July 26, 2013 2:46 AM
 To: r-help@r-project.org
 Subject: [R] help with apply (lapply or sapply not sure)
 
 I am reading in a bunch of files and then processing them all in the
 same
 way.
 
 I am sure there as a better way then to copy and past the code for each
 file. Here is what I've done so far
 
 InputFiles-
 as.character(list.files(~/ISLE/RWork/DataWarehouseMining/byCourse/))
 #Path to the Course data files
 
 for (i in InputFiles) {
   #
 print(head(read.csv(paste(~/ISLE/RWork/DataWarehouseMining/byCourse/,
 i, sep=
   print(paste(Reading file
 ~/ISLE/RWork/DataWarehouseMining/byCourse/, i,
 sep=,))
   assign(i,
 read.csv(paste(~/ISLE/RWork/DataWarehouseMining/byCourse/, i,
 sep=)))
 }#note last file is NOT a course file by the student information.
 Master-StudentInfoForRobertWUnitAt7A_2.csv #this is the last file
 
 
 CourseFiles -InputFiles[- c(15,16)] # ignore the student info
 ...7A.csv 
 ...7A_2.csv
 
 #for each file I do the following
 #Bis 101
 summary(BigInstBIS101.csv)
 B101 - BigInstBIS101.csv[-c(3,4,8)]
 summary(B101)
 B101$WH_ID - as.factor(B101$WH_ID)
 B101$SID - as.factor(B101$SID)
 B101$TERM - as.factor(B101$TERM)
 B101$CRN - as.factor(B101$CRN)
 B101$CRN_TRM - as.factor(B101$CRN_TRM)
 B101$INST_NUM - as.factor(B101$INST_NUM)
 B101$zGrade - with(B101, ave(GRADE., list(TERM, INST_NUM), FUN =
 scale))
 write.csv(B101,B101.csv, row.names = FALSE)
 
 #Bis 2A
 B2A - BigInstBIS2A.csv[-c(3,4,8)]
 summary(B2A)
 B2A$WH_ID - as.factor(B2A$WH_ID)
 B2A$SID - as.factor(B2A$SID)
 B2A$TERM - as.factor(B2A$TERM)
 B2A$CRN - as.factor(B2A$CRN)
 B2A$CRN_TRM - as.factor(B2A$CRN_TRM)
 B2A$INST_NUM - as.factor(B2A$INST_NUM)
 B2A$zGrade - with(B2A, ave(GRADE., list(TERM, INST_NUM), FUN = scale))
 write.csv(B2A,B2A.csv, row.names = FALSE)
 
 
 And so on for another 12 courses, however I am changing what I am doing
 as
 part of the reading in the file and don't want to replace the code in
 14
 different places.
 suggestions?  Thanks!
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated measures Cox regression ??coxph??

2013-07-26 Thread Göran Broström

On 07/26/2013 04:06 AM, John Sorkin wrote:

David Thank you for your thoughts. The data I am analyzing do not
come from a clinical trial but rather from a cohort study whose aim
is to determine risk factors for surgical therapy to treat their
joints. John


As David explained, there are several ways of approaching this 
situation. If it is a treatment/control case (which yours isn't), 
stratification is appropriate. It boils down to a simple sign test 
where we compare the number of pairs with longest survival of the 
treated with the number of pairs with the longest survival of the 
control. Undetermined (due to censoring) pairs are thrown away.


Generally, stratification is an alternative to the frailty model, but it 
has some drawbacks: loss of power (especially with small stratum sizes), 
and you cannot use covariates that are constant within pairs (personal 
characteristics in your case). The frailty model comes with stronger 
assumptions than stratification, but you avoid the drawbacks just 
mentioned. The clustering method, finally, is for variance correction in 
the ordinary Cox regression.


In your case, would recommend the frailty approach with coxme (while we 
wait for Terry's verdict!).


Göran


Sent from my iPhone

On Jul 25, 2013, at 9:15 PM, Marc Schwartz marc_schwa...@me.com
marc_schwa...@me.com wrote:



On Jul 25, 2013, at 4:45 PM, David Winsemius
dwinsem...@comcast.net wrote:



On Jul 25, 2013, at 12:27 PM, Marc Schwartz wrote:



On Jul 25, 2013, at 2:11 PM, John Sorkin
jsor...@grecc.umaryland.edu wrote:


Colleagues, Is there any R package that will allow one to
perform a repeated measures Cox Proportional Hazards
regression? I don't think coxph is set up to handle this type
of problem, but I would be happy to know that I am not
correct. I am doing a study of time to hip joint replacement.
As each person has two hips, a given person can appear in the
dataset twice, once for the left hip and once for the right
hip, and I need to account for the correlation of data from a
single individual. Thank you, John




John,

See Terry's 'coxme' package:

http://cran.r-project.org/web/packages/coxme/index.html


When I looked over the description of coxme, I was concerned it
was not really designed with this in mind. Looking at Therneau
and Grambsch, I thought section 8.4.2 in the 'Multiple Events per
Subject' Chapter fit the analysis question well. There they
compared the use of coxph( ...+cluster(ID),,...)  withcoxph(
...+strata(ID),,...). Unfortunately I could not tell for sure
which one was being described as superio but I think it was the
cluster() alternative. I seem to remember there are discussions
in the archives.



David,

I think that you raise a good point. The example in the book (I had
to wait to get home to read it) is potentially different however,
in that the subject's eye's were randomized to treatment or
control, which would seem to suggest comparable baseline
characteristics for each pair of eyes, as well as an active
intervention on one side where a difference in treatment effect
between each eye is being analyzed.

It is not clear from John's description above if there is one hip
that will be treated versus one as a control and whether the extent
of disease at baseline is similar in each pair of hips. Presumably
the timing of hip replacements will be staggered at some level,
even if there is comparable disease, simply due to post-op recovery
time and surgical risk. In cases where the disease between each hip
is materially different, that would be another factor to consider,
however I would defer to orthopaedic physicians/surgeons from a
subject matter expertise consideration. It is possible that the
bilateral hip replacement data might be more of a parallel to
bilateral breast cancer data, if each breast were to be tracked
separately.

I have cc'd Terry here, hoping that he might jump in and offer some
insights into the pros/cons of using coxme versus coxph with either
a cluster or strata based approach, or perhaps even a frailty based
approach as in 9.4.1 in the book.

Regards,

Marc




-- David.


You also might find the following of interest:

http://bjo.bmj.com/content/71/9/645.full.pdf

http://www.ncbi.nlm.nih.gov/pubmed/6885

http://www.ncbi.nlm.nih.gov/pubmed/22078901



Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
posting guide http://www.R-project.org/posting-guide.html and
provide commented, minimal, self-contained, reproducible code.


David Winsemius Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
posting guide http://www.R-project.org/posting-guide.html and
provide commented, minimal, self-contained, reproducible code.





Confidentiality Statement: This email message, including any
attachments, is for 

Re: [R] Change values in a dateframe-Speed TEST

2013-07-26 Thread arun
Hi Michel,
Sorry, I misunderstood your question.
You could also try:
library(plyr)
df1New-droplevels(ddply(TEST,.(Matricule),function(x) {x[,c(Nom,Prenom)]- 
x[1,c(Nom,Prenom)];x}))
df2New-droplevels(ddply(TEST,.(Matricule),function(x) {x[,c(Nom,Prenom)]- 
x[nrow(x),c(Nom,Prenom)];x}))
 identical(df1,df1New)
#[1] TRUE
 identical(df2,df2New)
#[1] TRUE

#or

library(data.table)
dt1- data.table(TEST)
dt2-dt1
dt1[,Nom:=head(Nom,1),by=Matricule]
dt1[,Prenom:=head(Prenom,1),by=Matricule]
 identical(df1,droplevels(as.data.frame(dt1)))
#[1] TRUE

dt2[,Nom:=tail(Nom,1),by=Matricule]
dt2[,Prenom:=tail(Prenom,1),by=Matricule]
 identical(df2,droplevels(as.data.frame(dt2)))
#[1] TRUE


#If you are considering speed, then ?data.table() would be useful
set.seed(28)
dfTest- as.data.frame(matrix(sample(1:50,1e6*5,replace=TRUE),ncol=5))

system.time({res1-do.call(rbind,lapply(split(dfTest,dfTest$V1),
    FUN=function(x) {x[,c(V2,V3)] - 
x[1,c(V2,V3)];x}))})
#  user  system elapsed 
#  4.452   0.036   4.499 
row.names(res1)-1:nrow(res1)

dtNew- data.table(dfTest)
system.time({dtNew[,V2:=head(V2,1),by=V1]
    dtNew[,V3:=head(V3,1),by=V1]
     dtNew-dtNew[order(V1)]  #here, the dataset was not pre-sorted, so just to 
keep the same order as the above solution

    })
# user  system elapsed 
#   0.132   0.000   0.133 
identical(res1,as.data.frame(dtNew))
#[1] TRUE


A.K.




- Original Message -
From: Arnaud Michel michel.arn...@cirad.fr
To: Berend Hasselman b...@xs4all.nl
Cc: R help r-help@r-project.org
Sent: Thursday, July 25, 2013 3:59 AM
Subject: Re: [R] Change values in a dateframe-Speed TEST

Le 25/07/2013 08:50, Berend Hasselman a écrit :
 On 25-07-2013, at 08:35, Arnaud Michel michel.arn...@cirad.fr wrote:

 But I just noticed that the two solutions are not comparable :
 the change concern only Nom and Prenom (solution Berend) and not also Sexe 
 or Date.de.naissance orother variables (solution Arun) that can changed. But 
 my question was badly put.
 Indeed:-)

 But that can be remedied with (small correction w.r.t. initial solution: 
 drop=TRUE removed; not relevant here)

 r1 - droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule),
                      FUN=function(x) {x[,1:ncol(x)] - x[1,1:ncol(x)];x})))

 and

 r2 - droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule),
                      FUN=function(x) {x[,1:ncol(x)] - 
x[nrow(x),1:ncol(x)];x})))
Thank you but I keep
{x[,c(Nom,Prénom)] - x[nrow(x),c(Nom,Prénom)];x} because in the 
dataframe there are other variables that I do not want to change. I want 
change only Nom and Prénom

PS : ?w.r.t.
Michel

 Less elegant than alternative with ave

 Berend

 Michel

 Le 25/07/2013 08:06, Arnaud Michel a écrit :
 Hi

 For a dataframe with name PaysContrat1 and with
 nrow(PaysContrat1)
 [1] 52366

 the test of system.time is :

 system.time(droplevels(do.call(rbind,lapply(split(PaysContrat1,PaysContrat1$Matricule),
 FUN=function(x) {x[,c(Nom,Prénom)] - 
 x[nrow(x),c(Nom,Prénom),drop=TRUE];x}
    user  system elapsed
   14.03    0.00   14.04

 system.time(droplevels(PaysContrat1[with(PaysContrat1,ave(seq_along(Matricule),Matricule,FUN=min))
  ,]  ))
    user  system elapsed
     0.2     0.0     0.2

 Michel

 Le 24/07/2013 15:29, arun a écrit :
 Hi Michel,
 You could try:


 df1New-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=min)),])
 row.names(df1New)-1:nrow(df1New)
 df2New-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=max)),])
 row.names(df2New)-1:nrow(df2New)
   identical(df1New,df1)
 #[1] TRUE
   identical(df2New,df2)
 #[1] TRUE
 A.K.



 - Original Message -
 From: Arnaud Michel michel.arn...@cirad.fr
 To: R help r-help@r-project.org
 Cc:
 Sent: Wednesday, July 24, 2013 2:39 AM
 Subject: [R] Change values in a dateframe

 Hello

 I have the following problem :
 The dataframe TEST has multiple lines for a same person because :
 there are differents values of Nom or differents values of Prenom
 but the values of Matricule or Sexe or Date.de.naissance are the same.

 TEST - structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L,
 5L, 6L, 9L, 3L, 3L, 7L), .Label = c(CHICHE, GEOF, GUTIER,
 JACQUE, LANGUE, LANGUE-LOPEZ, RIVIER, TRU, VINCENT
 ), class = factor), Prenom = structure(c(8L, 3L, 4L, 5L, 1L,
 2L, 2L, 9L, 6L, 7L, 7L), .Label = c(Edgar, Elodie, Jeanine,
 Jeannine, Michel, Michele, Michèle, Michelle, Victor
 ), class = factor), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
 1L, 1L, 2L, 1L, 1L, 1L), .Label = c(Féminin, Masculin), class =
 factor),
       Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
       1L, 3L, 3L, 3L), .Label = c(03/09/1940, 04/03/1946, 07/12/1947,
       18/11/1945, 27/09/1947, 29/12/1936, 30/03/1935), class =
 factor)), .Names = c(Matricule,
 Nom, Prenom, Sexe, Date.de.naissance), class = data.frame,
 row.names = c(NA,
 -11L))


 I would want to make homogeneous the information 

Re: [R] Function, that assigns two vectors to each other

2013-07-26 Thread arun
HI,
You could try:
my.data- structure...
pe-round(pe,0)
peMax-as.data.frame(sapply(paste0(a,1:4),function(x) {x1-my.data[,x]; 
unsplit(lapply(split(x1,x1),function(y) {x2-row.names(pe)[pe[,x]%in% y]; 
x3-x2[which.max(as.numeric(gsub(\\D+,,x2)))];rep(x3,length(y))}),x1)}),stringsAsFactors=FALSE)
 names(peMax)- paste0(pe,1:4)
 my.dataNew- cbind(my.data,peMax)
 
  my.dataNew
#   pa  a1  a2  a3  a4  pe1  pe2  pe3  pe4
#1   1  84 113  96  76  12%  89%  24%   0%
#2   2 108 101 108 106  90%  45%  80%  72%
#3   3 113  99 110  94 100%  36%  89%  25%
#4   4  99 108  99 124  78%  61%  49% 100%
#5   5  98 122 118  91  72% 100% 100%  12%
#6   6  88  92 100 103  27%  12%  58%  46%
#7   7  90  90  90 107  45%   2%  12%  78%
#8   8  89 110  89 106  38%  79%   5%  72%
#9   9  95 109 102 113  57%  72%  67%  89%
#10 10  77  95  99  96   0%  23%  49%  34%

If you don't want the % attached to the number
 my.dataNew[,6:9]-lapply(my.dataNew[6:9],function(x) 
as.numeric(gsub(\\D+,,x)))
 my.dataNew
#   pa  a1  a2  a3  a4 pe1 pe2 pe3 pe4
#1   1  84 113  96  76  12  89  24   0
#2   2 108 101 108 106  90  45  80  72
#3   3 113  99 110  94 100  36  89  25
#4   4  99 108  99 124  78  61  49 100
#5   5  98 122 118  91  72 100 100  12
#6   6  88  92 100 103  27  12  58  46
#7   7  90  90  90 107  45   2  12  78
#8   8  89 110  89 106  38  79   5  72
#9   9  95 109 102 113  57  72  67  89
#10 10  77  95  99  96   0  23  49  34
A.K. 



- Original Message -
From: Anne-Marie B. Gallrein gallr...@psychologie.tu-dresden.de
To: John Kane jrkrid...@inbox.com
Cc: r-help@r-project.org
Sent: Thursday, July 25, 2013 7:28 AM
Subject: Re: [R] Function, that assigns two vectors to each other

Hello guys, I created an example data set:

structure(list(pa = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), a1 = c(84,
108, 113, 99, 98, 88, 90, 89, 95, 77), a2 = c(113, 101, 99, 108,
122, 92, 90, 110, 109, 95), a3 = c(96, 108, 110, 99, 118, 100,
90, 89, 102, 99), a4 = c(76, 106, 94, 124, 91, 103, 107, 106,
113, 96)), .Names = c(pa, a1, a2, a3, a4), row.names = c(NA,
-10L), class = data.frame)

So the data frame contains the numbers of my participants (1 to 10) and the 
score, they hit on 4 tasks (a1 to a4).

I wrote this function, to use on the data:


pe-apply(X=my.data[,c(a1,a2,a3,a4)],
          
           MARGIN=2,
          
           FUN=quantile,
          
           probs=seq(0,1,by=.01),
          
           na.rm=TRUE)

round(pe,0)


It computes the percentiles of each task. So when using this function I know, 
that e.g.
a person who got 77 points on task 1 (a1) has a percentile of 0%.
If a person scores 88 points then he/she got the percentiles 21% to 27%, so 27% 
got the same amount of points or less.
In comparison in task 4 (a4) a person reaching 77 points has a percentile of 1%.

Now I want to add 4 columns to my.data (pe1 to pe4).

The final data frame my.data shall have 10 rows and 9 columns

These columns (pe1 to pe4) shall show the maximum percentile someone reached 
according to his points for each task.
So for the person who reached 77 points in a1 the respective pe1 would be 0.
For all the people who reached 88 points in a1 the respective pe1 would be 27.
For all the people who reached 77 points in a4 the respective pe1 would be 1.
The final data frame my.data shall have 10 rows and 9 columns.

So for the first participant (pa=1), the pe's would be a1=84  -- pe=12; a2=113 
 -- pe=89, a3=96 -- pe=24, a4=76 -- pe=0

I hope, that is clearer than before :)

Thanks a lot,

Anne




Am 24.07.2013 14:47, schrieb John Kane:
 Welcome  to R-help
 it is a bit hard to see exactly what you want without data. Rest of the 
 explanation looks good though it appears you may have sent this in HTML and 
 the list asks for text.  It strips out the html and we lose any html format.

 Can I suggest reading these 
 https://github.com/hadley/devtools/wiki/Reproducibility
  
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

 and then getting back to use with some data.  The best way to provide data , 
 as is described in the above links is to use dput()  (type ?dput for help ) 
 and then just copy and paste the results into the mail.



 John Kane
 Kingston ON Canada


 -Original Message-
 From: gallr...@psychologie.tu-dresden.de
 Sent: Wed, 24 Jul 2013 12:25:35 +0200
 To: r-help@r-project.org
 Subject: [R] Function, that assigns two vectors to each other

 Hey guys,

 In my data setv (KD) I have 4 columns
 (Punkte.AG1,Punkte.AG2,Punkte.AG3,Punkte.WI) I'm interested in.

 These columns contain the participants' scores of a specific task.

 I computed the percentiles of the columns using this code:

 pe-apply(X=KD[,c(Punkte.AG1,Punkte.AG2,Punkte.AG3,Punkte.WI)],

 MARGIN=2,

 FUN=quantile,

 probs=seq(0,1,by=.01),

 na.rm=TRUE)

 round(pe,0)


 This is the output (to the 20^th percentile):

 pe

 Punkte.AG1 Punkte.AG2 Punkte.AG3 Punkte.WI

 0%6319

 1%74311

 2%86312

 3%87412

 4%97512

 5%98512

 6%108512

 7%108512

 

Re: [R] How to split two levels several times?

2013-07-26 Thread arun


May be this also helps:
XXX: dataset
rl-rle(as.character(XXX$electrode))
 
dat-do.call(rbind,lapply(seq_along(rl$lengths),function(i){x1-if(rl$values[i]==electrode1
  (rl$lengths[i]%/%31)) rep(3,rl$lengths[i]%/%3) else 
rl$lengths[i];data.frame(Len=x1,Val=rl$values[i])}))
 lst1-split(cumsum(dat[,1]),((seq_along(dat[,1])-1)%/%2)+1)
vec1-sapply(lst1,max)
vec2-c(1,vec1[-length(vec1)]+1)
res-  lapply(seq_along(lst1),function(i) {x1-lst1[[i]]; 
XXX[seq(vec2[i],max(x1)),]})
 res
#[[1]]
 #  electrode length
#1 electrode1    5.7
#2 electrode1    6.3
#3 electrode1    6.2
#4 electrode2   11.4
#5 electrode2    9.7
#
#[[2]]
 #   electrode length
#6  electrode3   14.2
#7  electrode3   14.8
#8  electrode3   12.6
#9  electrode2   11.4
#10 electrode2    9.7
#
#[[3]]
 #   electrode length
#11 electrode4   17.0
#12 electrode4   16.3
#13 electrode4   17.8
#14 electrode4   18.3
#15 electrode4   16.9
#16 electrode4   18.5
#17 electrode1    5.7
#18 electrode1    6.3
#19 electrode1    6.2

#[[4]]
 #   electrode length
#20 electrode1    5.7
#21 electrode1    6.3
#22 electrode1    6.2
#23 electrode3   14.2
#24 electrode3   14.8
#25 electrode3   12.6

Also, tested in cases like below:
XXX1- structure(list(electrode = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 
3L, 3L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L), .Label = c(electrode1, 
electrode2, electrode3, electrode4), class = factor), 
    length = c(5.7, 6.3, 6.2, 11.4, 9.7, 14.2, 14.8, 12.6, 11.4, 
    9.7, 17, 16.3, 17.8, 18.3, 16.9, 18.5, 5.7, 6.3, 6.2, 7.7, 
    7.3, 6.2, 6.7, 6.8, 6.9, 5.7, 6.3, 6.2, 14.2, 14.8, 12.6)), .Names = 
c(electrode, 
length), class = data.frame, row.names = c(NA, -31L))

XXX2-structure(list(electrode = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 
3L, 3L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 3L, 3L, 3L), .Label = c(electrode1, electrode2, 
electrode3, electrode4), class = factor), length = c(5.7, 
6.3, 6.2, 11.4, 9.7, 14.2, 14.8, 12.6, 11.4, 9.7, 17, 16.3, 17.8, 
18.3, 16.9, 18.5, 5.7, 6.3, 6.2, 7.7, 7.3, 6.2, 6.7, 6.8, 6.9, 
14.2, 14.8, 12.6)), .Names = c(electrode, length), class = data.frame, 
row.names = c(NA, 
-28L))

rl-rle(as.character(XXX1$electrode))
 
dat-do.call(rbind,lapply(seq_along(rl$lengths),function(i){x1-if(rl$values[i]==electrode1
  (rl$lengths[i]%/%31)) rep(3,rl$lengths[i]%/%3) else 
rl$lengths[i];data.frame(Len=x1,Val=rl$values[i])}))
 lst1-split(cumsum(dat[,1]),((seq_along(dat[,1])-1)%/%2)+1)
vec1-sapply(lst1,max)
vec2-c(1,vec1[-length(vec1)]+1)
res1-  lapply(seq_along(lst1),function(i) {x1-lst1[[i]]; 
XXX1[seq(vec2[i],max(x1)),]})


rl-rle(as.character(XXX2$electrode))
 
dat-do.call(rbind,lapply(seq_along(rl$lengths),function(i){x1-if(rl$values[i]==electrode1
  (rl$lengths[i]%/%31)) rep(3,rl$lengths[i]%/%3) else 
rl$lengths[i];data.frame(Len=x1,Val=rl$values[i])}))
 lst1-split(cumsum(dat[,1]),((seq_along(dat[,1])-1)%/%2)+1)
vec1-sapply(lst1,max)
vec2-c(1,vec1[-length(vec1)]+1)
res2-  lapply(seq_along(lst1),function(i) {x1-lst1[[i]]; 
XXX2[seq(vec2[i],max(x1)),]})

Didn't test it extensively.  So, it may fail in other situations.
A.K.






- Original Message -
From: Rui Barradas ruipbarra...@sapo.pt
To: dennis1...@gmx.net
Cc: r-help@r-project.org
Sent: Thursday, July 25, 2013 2:53 PM
Subject: Re: [R] How to split two levels several times?

Hello,

I think the following does what you want. (I don't know if it makes much 
sense but it works.)



lens - rle(as.character(XXX$electrode))$lengths
m - length(lens) %/% 2
idx - rep(1:m, sapply(1:m, function(.m) sum(lens[(2*.m - 1):(2*.m)])))
if(length(lens) %% 2 != 0){
    idx - c(idx, rep(m + 1, lens[length(lens)]))
    sp_idx - split(idx, idx)
    n - length(sp_idx[[m]])
    if(n %/% 2  length(sp_idx[[m + 1]]))
        sp_idx[[m]][(n %/% 2 + 1):n] - sp_idx[[m + 1]][1]
    else
        sp_idx[[m]][(n - length(sp_idx[[m + 1]]) + 1):n] -  sp_idx[[m + 1]][1]
    idx - unlist(sp_idx)
}

sp - split(XXX, idx)
sp



Rui Barradas

Em 25-07-2013 11:40, dennis1...@gmx.net escreveu:
 Hi Rui
 once more thank you for your help. But the code does so far not solve the 
 problem because it still treats rows 17-22 (repeated appearance of 
 electrode1) as one single level. However as can be seen by rows 1-3 (or rows 
 17-19 and rows 20-22) and the order of the length variable (row 1 = 5.7, row 
 2 = 6.3, row 3 = 6.2) electrode1 consists only of 3 rows. Maybe that was not 
 made absolutely clear by me. As described in my mail before if by chance (or 
 systematically) it happens to be that electrode1 appears right after each 
 other in the table then the code should split it “half way”.

 So idx should not return
   [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4

 but instead 6 times number 4 at the end
   [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4

 Do you have any solution?


 Gesendet: Mittwoch, 24. Juli 2013 um 23:47 Uhr
 Von: Rui Barradas ruipbarra...@sapo.pt
 An: dennis1...@gmx.net
 Cc: 

[R] How do I get a a p-value for the output of an lme model with lme4?

2013-07-26 Thread Maria Sol Lago
Hi there,

I just started using lme4 and I have a question about obtaining p-values. I'm 
trying to get p-values for the output of a linear mixed-effects model. In my 
experiment  I have a 2 by 2 within subjects design, fully crossing two factors, 
Gram and Number. This is the command I used to run the model:

m - lmer(RT ~ Gram*Number + (1|Subject) + (0+Gram+Number|Subject) + 
(1|Item_number),data= data)

If I understand this code, I am getting coefficients for the two fixed effects 
(Gram and Number) and their interaction, and I am fitting a model that has 
by-subject intercepts and slopes for the two fixed effects, and a by-item 
intercept for them. Following Barr et al. (2013), I thought that this code gets 
rid of the correlation parameters. I don't want estimate the correlations 
because I want to get the p-values using pvals.fnc (), and I read that this 
function doesn't work if there are correlations in the model.

The command seems to work:

m
Linear mixed model fit by REML 
Formula: RT ~ Gram * Number + (1 | Subject) + (0 + Gram + Number | Subject) 
+ (1 |Item_number) 
   Data: mverb[mverb$Region == 06v1, ] 
   AIC   BIC logLik deviance REMLdev
 20134 20204 -1005420138   20108
Random effects:
 Groups  NameVariance  Std.Dev. Corr  
Item_number (Intercept)   273.508  16.5381   
 Subject Gramgram0.000   0.   
 Gramungram   3717.213  60.9689NaN
 Number159.361   7.7046NaN -1.000 
 Subject (Intercept) 14075.240 118.6391   
 Residual35758.311 189.0987   
Number of obs: 1502, groups: Item_number, 48; Subject, 32

Fixed effects:
 Estimate Std. Error t value
(Intercept)402.520 22.321  18.033
Gram1  -57.788 14.545  -3.973
Number1 -4.191  9.858  -0.425
Gram1:Number1   15.693 19.527   0.804

Correlation of Fixed Effects:
(Intr) Gram1  Numbr1
Gram1   -0.181  
Number1 -0.034  0.104   
Gram1:Nmbr1  0.000 -0.002 -0.011

However, when I try to calculate the p-values I still get an error message:

pvals.fnc(m, withMCMC=T)$fixed
Error in pvals.fnc(m, withMCMC = T) : 
MCMC sampling is not implemented in recent versions of lme4
  for models with random correlation parameters

Am I making a mistake when I specify my model? Shouldn't pvals.fnc() work if I 
removed the correlations?

Thanks for your help!

--Sol
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Duplicated function with conditional statement

2013-07-26 Thread vanessa van der vaart
Hi everybody,,
I have a question about R function duplicated(). I have spent days try to
figure this out,but I cant find any solution yet. I hope somebody can help
me..
this is my data:

subj=c(1,1,1,2,2,3,3,3,4,4)
response=c('sample','sample','buy','sample','buy','sample','
sample','buy','sample','buy')
product=c(1,2,3,2,2,3,2,1,1,4)
tt=data.frame(subj, response, product)

the data look like this:

 subj response product
1 1   sample   1
2 1   sample   2
3 1  buy  3
4 2   sample   2
5 2 buy   2
6 3   sample   3
7 3   sample   2
8 3 buy   1
9 4  sample   1
10   4   buy4

I want to create new  column based on the value on response and product
column. if the value on product is duplicated, then  the value on new column
is 1, otherwise is 0.
but I want to add conditional statement that the value on product column
will only be considered as duplicated if the value on response column is
'buy'.
for illustration, the table should look like this:

subj response product newcolumn
1 1   sample   1  0
2 1   sample   2  0
3 1  buy  3  0
4 2   sample   2  0
5 2 buy   2  0
6 3   sample   3  1
7 3   sample   2   1
8 3 buy   1   0
9 4  sample   11
10   4   buy   4 0


can somebody help me?
any help will be appreciated.
I am new in this mailing list, so forgive me in advance, If I did not  ask
the question appropriately.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] duplicated() with conditional statement

2013-07-26 Thread arun
Hi,
You may try this (didn't get time to test this extensively)
indx-which(tt$response!=buy)
tt$newcolumn-0
 tt[unlist(lapply(seq_along(indx),function(i) {x1-if(indx[i]==nrow(tt)) 
indx[i] else seq(indx[i]+1,indx[i+1]-1);x2-rbind(tt[indx[1:i],],tt[x1,]); 
if(any(x2$response==sample)) 
row.names(x2[duplicated(x2$product),])})),newcolumn]-1
 tt
#   subj response product newcolumn
#1 1   sample   1 0
#2 1   sample   2 0
#3 1  buy   3 0
#4 2   sample   2 0
#5 2  buy   2 0
#6 3   sample   3 1
#7 3   sample   2 1
#8 3  buy   1 0
#9 4   sample   1 1
#10    4  buy   4 0
A.K.








From: vanessa van der vaart vanessa.va...@gmail.com
To: smartpink...@yahoo.com 
Sent: Thursday, July 25, 2013 3:49 PM
Subject: Re: duplicated() with conditional statement



thank you for the reply.
It based on entire data set. 

subj response product newcolumn
1     1   sample       1          0         
2     1   sample       2          0         
3     1      buy          3           0       
4     2   sample       2          0        . 
5     2         buy       2           0
6     3   sample       3          1
7     3   sample       2           1
8     3         buy       1           0
9     4  sample       1            1
10   4       buy       4             0

I am sorry i didnt question it very clearly, let me change the conditional 
statement, I hope you can understand. i will explain by example

as you can see, almost every number is duplicated, but only in row 6th,7th,and 
9th the value on column is 1.

on row4th, the value is duplicated( 2 already occurred on 2nd row),but since 
the value is considered as duplicated only if the value is duplicated where the 
response is 'buy' than the value on column, on row4th still zero. 

On row 6th, where the value product column is 3. 3 is already occurred in 3rd 
row where the value on response is 'buy', so the value on column should be 1

I hope it can understand the conditional statement. 





On Thu, Jul 25, 2013 at 8:25 PM, smartpink...@yahoo.com wrote:

Hi,
May be I understand it incorrectly.
Your new column value doesn't correspond to your conditional statement.  Also, 
is this duplication based on entire dataset or within subj.
quote author='misseb'
Hi everybody,,
I have a question about R function duplicated(). I have spent days try to
figure this out,but I cant find any solution yet. I hope somebody can help
me..
this is my data:

subj=c(1,1,1,2,2,3,3,3,4,4)
response=c('sample','sample','buy','sample','buy','sample','sample','buy','sample','buy')
product=c(1,2,3,2,2,3,2,1,1,4)
tt=data.frame(subj, response, product)

the data look like this:

 subj response product
1     1   sample       1
2     1   sample       2
3     1      buy          3
4     2   sample       2
5     2         buy       2
6     3   sample       3
7     3   sample       2
8     3         buy       1
9     4  sample       1
10   4       buy        4



I want to create new  column based on the value on response and product
column. if the value on product is duplicated, then  the value on new column
is 1, otherwise is 0.
but I want to add conditional statement that the value on product column
will only be considered as duplicated if the value on response column is
'buy'.
for illustration, the table should look like this:

subj response product newcolumn
1     1   sample       1          0
2     1   sample       2          0
3     1      buy          3           0
4     2   sample       2          0
5     2         buy       2           0
6     3   sample       3          1
7     3   sample       2           1
8     3         buy       1           0
9     4  sample       1            1
10   4       buy       4             0


can somebody help me?
any help will be appreciated.
I am new in this mailing list, so forgive me in advance, If I did not  ask
the question appropriately.





/quote
Quoted from:
http://r.789695.n4.nabble.com/duplicated-with-conditional-statement-tp4672342.html


_
Sent from http://r.789695.n4.nabble.com



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pairwise comparison between columns, logic

2013-07-26 Thread arun
HI,
Not sure about what your expected output would be.  Also 'CEBPA' was not 
present in the Data.txt. 

gset- read.table(Names.txt,header=TRUE,stringsAsFactors=FALSE)
 temp1- read.table(Data.txt,header=TRUE,stringsAsFactors=FALSE)
lst1-split(temp1,temp1$Names)
mat1-combn(gset[-1,1],2) #removed CEBPA
library(plyr)

lst2-lapply(split(mat1,col(mat1)),function(x) 
{x1-join_all(lst1[x],by=patient_id,type=inner);x1[patient_id] })
names(lst2)-apply(mat1,2,paste,collapse=_)
do.call(rbind,lst2)
#   patient_id
#DNMT3A_FLT3.1 LAML-AB-2811-TB #common ids between DNMT3A and FLT3
#DNMT3A_FLT3.2 LAML-AB-2816-TB
#DNMT3A_FLT3.3 LAML-AB-2818-TB
#DNMT3A_IDH1.1 LAML-AB-2802-TB#common ids between DNMT3A and IDH1.  If you 
wanted it as separate dataframes, use `lst2`.
#DNMT3A_IDH1.2 LAML-AB-2822-TB
#DNMT3A_NPM1.1 LAML-AB-2802-TB
#DNMT3A_NPM1.2 LAML-AB-2809-TB
#DNMT3A_NPM1.3 LAML-AB-2811-TB
#DNMT3A_NPM1.4 LAML-AB-2816-TB
#DNMT3A_NRAS   LAML-AB-2816-TB
#FLT3_NPM1.1   LAML-AB-2811-TB
#FLT3_NPM1.2   LAML-AB-2812-TB
#FLT3_NPM1.3   LAML-AB-2816-TB
#FLT3_NRAS LAML-AB-2816-TB
#IDH1_NPM1 LAML-AB-2802-TB
#NPM1_NRAS LAML-AB-2816-TB
A.K.



Hello R experts, 

I am trying to solve the following logic. 
I have two input files. The first file (Names.txt) that has two columns: 
Column1 Column2 
CEBPA   CEBPA 
DNMT3A  DNMT3A 
FLT3FLT3 
IDH1IDH1 
NPM1NPM1 
NRASNRAS 
and the second input file Data.txt has two columns Names, patient_id. 
Namepatient_id 
DNMT3A  LAML-AB-2802-TB 
DNMT3A  LAML-AB-2809-TB 
DNMT3A  LAML-AB-2811-TB 
DNMT3A  LAML-AB-2816-TB 
DNMT3A  LAML-AB-2818-TB 
DNMT3A  LAML-AB-2822-TB 
DNMT3A  LAML-AB-2824-TB 
FLT3LAML-AB-2811-TB 
FLT3LAML-AB-2812-TB 
FLT3LAML-AB-2814-TB 
FLT3LAML-AB-2816-TB 
FLT3LAML-AB-2818-TB 
FLT3LAML-AB-2825-TB 
FLT3LAML-AB-2830-TB 
FLT3LAML-AB-2834-TB 
IDH1LAML-AB-2802-TB 
IDH1LAML-AB-2821-TB 

 What I am attempting to do is for each name in first column of 
names.txt, I do a pairwise comparison with the other names in the second
 column based on which patient ids are common. 
To explain in detail: 
As an example: I extract patient_ids for CEBPA and DNMT3A and see 
which are common, then I do the same for CEBPA and FLT3 and so on for 
CEBPA and the next name in column 2. 
So far the script I have written only does the comparison with the 
first name in the list. So essentially with itself. I am not sure why 
this logic is not working for all the names in column 2 for a single 
name in column 1. 

Below is my script: 

gset-read.table(Names.txt,header=F,na.strings = ., as.is=T) # reading in 
the genes 
temp-read.table(Data.txt,header=T,sep=\t) 


# 
  
  all-length(unique(temp$fpatient_id)) 
  final-c() 
  
  both.ab - list() 
  both - list() 
  temp.b - matrix() 
  
  for(i in 1:nrow(gset))  # Loop for genes in the first column 
  
  { 
    
    temp2-temp[which(temp$Column1 %in% gset[i,]),] 
    num.mut-length(unique(temp2$patient_id)) 
    
    temp.a -temp[which(temp$Column1 == gset[i,1]),] 
  
    for(j in 1:(nrow(gset))  # Loop for genes in the second column 
            
    { 
      temp.b -temp[which(temp$Column2 == gset[j,2]),] 
      # See which patient_ids of temp.a are in temp.b 
      both.ab[[i]]-temp.a[which(temp.a$patient_id %in% temp.b$patient_id),] 
    } 

    both[[i]]-both.ab[[i]] 
    
    num.both-length(unique(both[[i]]$patient_id)) 
    
    line-c(paste(gset[i, which(!(is.na(gset[i,]))) ],collapse=/), num.mut, 
all, num.mut/all, num.both) 
    final-rbind(final,line) 
  } 
Names.txtData.txtScript.txt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] duplicated() with conditional statement

2013-07-26 Thread arun
Hi,
Sorry,`indx` should be:
indx-which(tt$response==buy) #I changed indx but forgot about it
 tt$newcolumn-0
  tt[unlist(lapply(seq_along(indx),function(i) {x1-if(indx[i]==nrow(tt)) 
indx[i] else seq(indx[i]+1,indx[i+1]-1);x2-rbind(tt[indx[1:i],],tt[x1,]); 
if(any(x2$response==sample)) 
row.names(x2[duplicated(x2$product),])})),newcolumn]-1
 tt
   subj response product newcolumn
1 1   sample   1 0
2 1   sample   2 0
3 1  buy   3 0
4 2   sample   2 0
5 2  buy   2 0
6 3   sample   3 1
7 3   sample   2 1
8 3  buy   1 0
9 4   sample   1 1
10    4  buy   4 0
A.K.








From: vanessa van der vaart vanessa.va...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Thursday, July 25, 2013 8:55 PM
Subject: Re: duplicated() with conditional statement



hii thanks for the code, I tried the code but i got the error message,
Error in from:to : NA/NaN argument


I dont know what doest it mean, and I dont know how to fix it..
could you help please..

thank you very much in advance





On Thu, Jul 25, 2013 at 10:52 PM, arun smartpink...@yahoo.com wrote:

Hi,
You may try this (didn't get time to test this extensively)
indx-which(tt$response!=buy)
tt$newcolumn-0
 tt[unlist(lapply(seq_along(indx),function(i) {x1-if(indx[i]==nrow(tt)) 
indx[i] else seq(indx[i]+1,indx[i+1]-1);x2-rbind(tt[indx[1:i],],tt[x1,]); 
if(any(x2$response==sample)) 
row.names(x2[duplicated(x2$product),])})),newcolumn]-1
 tt
#   subj response product newcolumn
#1 1   sample   1 0
#2 1   sample   2 0
#3 1  buy   3 0
#4 2   sample   2 0
#5 2  buy   2 0
#6 3   sample   3 1
#7 3   sample   2 1
#8 3  buy   1 0
#9 4   sample   1 1
#10    4  buy   4 0
A.K.









From: vanessa van der vaart vanessa.va...@gmail.com
To: smartpink...@yahoo.com
Sent: Thursday, July 25, 2013 3:49 PM
Subject: Re: duplicated() with conditional statement




thank you for the reply.
It based on entire data set. 

subj response product newcolumn
1     1   sample       1          0         
2     1   sample       2          0         
3     1      buy          3           0       
4     2   sample       2          0        . 
5     2         buy       2           0
6     3   sample       3          1
7     3   sample       2           1
8     3         buy       1           0
9     4  sample       1            1
10   4       buy       4             0

I am sorry i didnt question it very clearly, let me change the conditional 
statement, I hope you can understand. i will explain by example

as you can see, almost every number is duplicated, but only in row 6th,7th,and 
9th the value on column is 1.

on row4th, the value is duplicated( 2 already occurred on 2nd row),but since 
the value is considered as duplicated only if the value is duplicated where 
the response is 'buy' than the value on column, on row4th still zero. 

On row 6th, where the value product column is 3. 3 is already occurred in 3rd 
row where the value on response is 'buy', so the value on column should be 1

I hope it can understand the conditional statement. 





On Thu, Jul 25, 2013 at 8:25 PM, smartpink...@yahoo.com wrote:

Hi,
May be I understand it incorrectly.
Your new column value doesn't correspond to your conditional statement.  
Also, is this duplication based on entire dataset or within subj.
quote author='misseb'
Hi everybody,,
I have a question about R function duplicated(). I have spent days try to
figure this out,but I cant find any solution yet. I hope somebody can help
me..
this is my data:

subj=c(1,1,1,2,2,3,3,3,4,4)
response=c('sample','sample','buy','sample','buy','sample','sample','buy','sample','buy')
product=c(1,2,3,2,2,3,2,1,1,4)
tt=data.frame(subj, response, product)

the data look like this:

 subj response product
1     1   sample       1
2     1   sample       2
3     1      buy          3
4     2   sample       2
5     2         buy       2
6     3   sample       3
7     3   sample       2
8     3         buy       1
9     4  sample       1
10   4       buy        4



I want to create new  column based on the value on response and product
column. if the value on product is duplicated, then  the value on new column
is 1, otherwise is 0.
but I want to add conditional statement that the value on product column
will only be considered as duplicated if the value on response column is
'buy'.
for illustration, the table should look like this:

subj response product newcolumn
1     1   sample       1          0
2     1   sample       2          0
3     1      buy          3           0
4     2   sample       2          0
5     2         buy       2           0
6     3   sample       3          

[R] help on carrying forward several vectors of rownames

2013-07-26 Thread franklin johnson
Dear reader,
I'm trying to use multiple vectors as rownames for the read.table function.
I tried to insert these vectors as a list, e.g. row.names=c(1,4), from the
excel file. I tried other ways, as if the argument only took continuous tab
separated
columns, e.g. row.names=c(1:4). But, this also did not work.
Is there a solution to this problem?
Regards,
-- 
Franklin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Maintaining data order in factanal with missing data

2013-07-26 Thread s00123776
Hi,

 

I'm new to R, so sorry if this is a simple answer. I'm currently trying to
collapse some ordinal variables into a composite; the program ideally should
take a data frame as input, perform a factor analysis, compute factor
scores, sds, etc., and return the rescaled scores and loadings. The
difficulty I'm having is that my data set contains a number of NA, which I
am excluding from the analysis using complete.cases(), and thus the
incomplete cases are skipped. These functions are for a longitudinal data
set with repeated waves of data, so the final rescaled scores from each wave
need to be saved as variables grouped by a unique ID (DMID). The functions
I'm trying to implement are as follows:

 

weighted.sd-function(x,w){

sum.w-sum(w)

sum.w2-sum(w^2)

mean.w-sum(x*w)/sum(w)

 
x.sd.w-sqrt((sum.w/(sum.w^2-sum.w2))*sum(w*(x-mean.w)^2))

return(x.sd.w)

}

 

re.scale-function(f.scores, raw.data, loadings){

 
fz.scores-(f.scores+mean(f.scores))/(sd(f.scores))

 
means-apply(raw.data,1,weighted.mean,w=loadings)

 
sds-apply(raw.data,1,weighted.sd,w=loadings)

grand.mean-mean(means)

grand.sd-mean(sds)

 
final.scores-((fz.scores*grand.sd)+grand.mean)

return(final.scores)

}

 

get.scores-function(data){

 
fact-factanal(data[complete.cases(data),],factors=1,scores=regression)

f.scores-fact$scores[,1]

f.loads-fact$loadings[,1]

rescaled.scores-re.scale(f.scores,
data[complete.cases(data),], f.loads)

output.list-list(rescaled.scores, f.loads)

names(output.list)-c(rescaled.scores,
factor.loadings)

return(output.list)

}

 

init.dfs-function(){

 
ab.1.df-subset(ab.df,,select=c(dmid,g5oab2:g5ovb1))

 
ab.2.df-subset(ab.df,,select=c(dmid,w2oab3:w2ovb1))

ab.3.df-subset(ab.df,,select=c(dmid,
w3oab3, w3oab4, w3oab7, w3oab8, w3ovb1))



ab.1.fa-get.scores(ab.1.df[-1])

ab.2.fa-get.scores(ab.2.df[-1])

ab.3.fa-get.scores(ab.3.df[-1])


}

 

Thanks for your help,

 

Justin


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GGplot 2 – cannot get histogram and box plot axis to match.

2013-07-26 Thread ONKELINX, Thierry
Dear John,

Use xlim() and ylim() instead of expand_limits()

library(ggplot2)

#sample data from ggplot2
data(Cars93, package = MASS)
dataSet - Cars93

#variables to calculate the range to extend the axis dataVector - 
unlist(dataSet[,MPG.city])

dataRange - diff(range(dataSet$MPG.city))

graphRange - c(min(dataSet$MPG.city) - dataRange/5,
max(dataSet$MPG.city) + dataRange/5)

#making the box plot
theBoxPlot - ggplot(dataSet,aes_string(x = MPG.city, y = MPG.city))

theBoxPlot -
  theBoxPlot  + geom_boxplot() + coord_flip() + ylim(limits = graphRange)
print(theBoxPlot)


#making the histogram
thePlot - ggplot(dataSet,aes_string(x = MPG.city))
thePlot - thePlot + geom_histogram()  + xlim(graphRange)
print(thePlot)



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
John W. Clow
Verzonden: donderdag 25 juli 2013 20:03
Aan: r-help@r-project.org
Onderwerp: [R] GGplot 2 – cannot get histogram and box plot axis to match.

Problem:
I am trying to get the histogram and box plot x axis to match. I’ve tried using 
the expand_limits function to make the axis match but that didn’t make the axis 
match. The histogram’s axis are still consistently larger than the ones for the 
box plot (though the function did help). Does anyone have a suggestion as to 
what I should do instead?


Background:
I am building a Shiny app that displays a histogram below a bar chart for a set 
of data that a user uploads to the app. If you want to see the app, go here 
http://spark.rstudio.com/jclow/Archive20130725HistogramApp/
To run the app, select “Use Sample Data” , then select  “MPG.city” under choose 
a column, then finally select box plot.


Sample code:
Below is a snippet of my code to demonstrate the problems I have.

library(ggplot2)

#sample data from ggplot2
data(Cars93, package = MASS)
dataSet - Cars93

#variables to calculate the range to extend the axis dataVector - 
unlist(dataSet[,MPG.city])

dataRange - max(dataVector) - min(dataVector)

graphRange - c(min(dataVector) - dataRange/5,
max(dataVector) + dataRange/5)

#making the box plot
theBoxPlot - ggplot(dataSet,aes_string(x = MPG.city,y = MPG.city))

theBoxPlot = theBoxPlot  + geom_boxplot() + expand_limits(y= graphRange) + 
coord_flip()
print(theBoxPlot)


#making the histogram
thePlot - ggplot(dataSet,aes_string(x = MPG.city)) thePlot -thePlot + 
geom_histogram()  + expand_limits(x= graphRange)

print(thePlot)


Thank you for taking the time to read this.

John Clow
UCSB Student

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en 
binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is 
door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the 
writer and may not be regarded as stating an official position of INBO, as long 
as the message is not confirmed by a duly signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lme (weights) and glht

2013-07-26 Thread ONKELINX, Thierry
Dear Sibylle,

Have you tried to create a new variable?

ME$fDiversity - factor(ME$Diversity)

H08_lme - lme(
log(Height2008_mean) ~ fDiversity,
data = ME,
random = ~ 1|Plot/SubPlot,
weights = varPower(form = ~Diversity),
na.action = na.omit,
subset = ME$Species == Ace_pse,
method = ML
)
summary(H08_lme)
anova(H08_lme)
g_H08_lme - glht(H08_lme, linfct = mcp(fDiversity = Tukey))

Please note that I’ve added (a lot of) whitespace to your code to make it 
easier to read.

Best regards,

Thierry

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.bemailto:thierry.onkel...@inbo.be
www.inbo.behttp://www.inbo.be/

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
Sibylle Stöckli
Verzonden: donderdag 25 juli 2013 12:16
Aan: r-help@r-project.org
Onderwerp: [R] lme (weights) and glht

Dear R members,

I tried to fit an lme model and to use the glht function of multcomp.
However, the glht function gives me some errors when using
weights=varPower().
The glht error makes sense as glht needs factor levels and the model
works fine without weights=. Does anyone know a solution so I do not
have to change the lme model?

Thanks
Sibylle

-- works fine
ME$Diversity=factor(ME$Diversity)
H08_lme-lme(log(Height2005_mean)~Diversity, data=ME, random=~1|Plot/
SubPlot, na.action=na.omit, subset=ME$Species==Pse_men, method=ML)
summary(H08_lme)
anova(H08_lme)
g_H08_lme-glht(H08_lme, linfct=mcp(Diversity=Tukey))
print(summary(g_H08_lme))

-- using lme with weights I changed the order of factor() and
introduced as.factor in the model

H08_lme-lme(log(Height2008_mean)~as.factor(Diversity), data=ME,
random=~1|Plot/SubPlot, weights=varPower(form=~Diversity),
na.action=na.omit, subset=ME$Species==Ace_pse, method=ML)
summary(H08_lme)
anova(H08_lme)
ME$Diversity=factor(ME$Diversity)
g_H08_lme-glht(H08_lme, linfct=mcp(Diversity=Tukey))

Error in mcp2matrix(model, linfct = linfct) :
   Variable(s) �Diversity� have been specified in �linfct� but cannot
be found in �model�!
[[alternative HTML version deleted]]
__
R-help@r-project.orgmailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en 
binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is 
door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the 
writer and may not be regarded as stating an official position of INBO, as long 
as the message is not confirmed by a duly signed document.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] number of items to replace is not a multiple of replacement length

2013-07-26 Thread Pascal Oettli

Hello,

Once again, the matrix EWMA has not the correct size.

Did you carefully read the answer by Thomas Stewart?
https://stat.ethz.ch/pipermail/r-help/attachments/20130724/c454b0f7/attachment.pl

Extract of his reply: When you expand the example to 5 stocks,
there will be 15 elements (5 variances and 10 covariances)

 EWMA-matrix(nrow=T,ncol=15))

Regards,
Pascal

On 26/07/2013 14:37, G Girija wrote:

Hi All,

I have 5 stock values and i am calculating EWMA
followed the logic as given ind following link.[
http://www.orecastingfinancialrisk.com/3.htmlhttp://www.forecastingfinancialrisk.com/3.html
]

library('tseries')
returns[,1]-returns[,1]-mean(returns[,1])
returns[,2]-returns[,2]-mean(returns[,2])
returns[,3]-returns[,3]-mean(returns[,3])
returns[,4]-returns[,4]-mean(returns[,4])
returns[,5]-returns[,5]-mean(returns[,5])
T-length(returns[,1])
T
EWMA-matrix(nrow=T,ncol=5)
lambda=0.94
S-cov(returns)
S
EWMA[1,] - S[lower.tri(S,diag=TRUE)]


*Error in EWMA[1, ] - S[lower.tri(S, diag = TRUE)] : *
*  number of items to replace is not a multiple of replacement length*
*
*
for(i in 2:T)
{
   S- lambda*S +(1-lambda)*t(returns[i])%*% returns[i]
EWMA[i,] - S[lower.tri(S,diag=TRUE)]
}
*
*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Maintaining data order in factanal with missing data

2013-07-26 Thread PIKAL Petr
Hi

You provided functions, so far so good. But without data it would be quite 
difficult to understand what the functions do and where could be the issue.

I suspect combination of complete cases selection together with subset and 
factor behaviour. But I can be completely out of target too.

Petr

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of s00123...@myacu.edu.au
 Sent: Friday, July 26, 2013 9:35 AM
 To: r-help@r-project.org
 Subject: [R] Maintaining data order in factanal with missing data
 
 Hi,
 
 
 
 I'm new to R, so sorry if this is a simple answer. I'm currently trying
 to collapse some ordinal variables into a composite; the program
 ideally should take a data frame as input, perform a factor analysis,
 compute factor scores, sds, etc., and return the rescaled scores and
 loadings. The difficulty I'm having is that my data set contains a
 number of NA, which I am excluding from the analysis using
 complete.cases(), and thus the incomplete cases are skipped. These
 functions are for a longitudinal data set with repeated waves of data,
 so the final rescaled scores from each wave need to be saved as
 variables grouped by a unique ID (DMID). The functions I'm trying to
 implement are as follows:
 
 
 
 weighted.sd-function(x,w){
 
 sum.w-sum(w)
 
 sum.w2-sum(w^2)
 
 mean.w-sum(x*w)/sum(w)
 
 
 x.sd.w-sqrt((sum.w/(sum.w^2-sum.w2))*sum(w*(x-mean.w)^2))
 
 return(x.sd.w)
 
 }
 
 
 
 re.scale-function(f.scores, raw.data, loadings){
 
 
 fz.scores-(f.scores+mean(f.scores))/(sd(f.scores))
 
 
 means-apply(raw.data,1,weighted.mean,w=loadings)
 
 
 sds-apply(raw.data,1,weighted.sd,w=loadings)
 
 grand.mean-mean(means)
 
 grand.sd-mean(sds)
 
 
 final.scores-((fz.scores*grand.sd)+grand.mean)
 
 return(final.scores)
 
 }
 
 
 
 get.scores-function(data){
 
 
 fact-
 factanal(data[complete.cases(data),],factors=1,scores=regression)
 
 f.scores-fact$scores[,1]
 
 f.loads-fact$loadings[,1]
 
 rescaled.scores-re.scale(f.scores,
 data[complete.cases(data),], f.loads)
 
 output.list-list(rescaled.scores,
 f.loads)
 
 names(output.list)-
 c(rescaled.scores,
 factor.loadings)
 
 return(output.list)
 
 }
 
 
 
 init.dfs-function(){
 
 
 ab.1.df-subset(ab.df,,select=c(dmid,g5oab2:g5ovb1))
 
 
 ab.2.df-subset(ab.df,,select=c(dmid,w2oab3:w2ovb1))
 
 ab.3.df-subset(ab.df,,select=c(dmid,
 w3oab3, w3oab4, w3oab7, w3oab8, w3ovb1))
 
 
 
 ab.1.fa-get.scores(ab.1.df[-1])
 
 ab.2.fa-get.scores(ab.2.df[-1])
 
 ab.3.fa-get.scores(ab.3.df[-1])
 
 
 }
 
 
 
 Thanks for your help,
 
 
 
 Justin
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to split two levels several times?

2013-07-26 Thread dennis1991
Hi Rui  Arun,
really thanks for investing so much time to deal with this problem! The code 
works now for this specific example. However it is not generally robust for 
slightly different situations. For instance it cannot correctly handle a slight 
variation of the table where I have again 4 types of electrodes of certain 
lengths. Electrode4 exists only 6 times. At the transition of the combinations 
3-4 and 4-1 there are 12 times electrode4 which stick together in the output 
$`9`. This leads to wrong splittings thereafter. Sorry for asking such tricky 
questions.

New table XXX

electrode   length
electrode1  206
electrode1  194
electrode1  182
electrode1  172
electrode1  169
electrode2  82
electrode2  78
electrode2  70
electrode2  58
electrode1  206
electrode1  194
electrode1  182
electrode1  172
electrode1  169
electrode3  260
electrode3  176
electrode3  137
electrode1  206
electrode1  194
electrode1  182
electrode1  172
electrode1  169
electrode4  86
electrode4  66
electrode4  64
electrode4  52
electrode4  27
electrode4  26
electrode2  82
electrode2  78
electrode2  70
electrode2  58
electrode1  206
electrode1  194
electrode1  182
electrode1  172
electrode1  169
electrode2  82
electrode2  78
electrode2  70
electrode2  58
electrode3  260
electrode3  176
electrode3  137
electrode2  82
electrode2  78
electrode2  70
electrode2  58
electrode4  86
electrode4  66
electrode4  64
electrode4  52
electrode4  27
electrode4  26
electrode3  260
electrode3  176
electrode3  137
electrode1  206
electrode1  194
electrode1  182
electrode1  172
electrode1  169
electrode3  260
electrode3  176
electrode3  137
electrode2  82
electrode2  78
electrode2  70
electrode2  58
electrode3  260
electrode3  176
electrode3  137
electrode4  86
electrode4  66
electrode4  64
electrode4  52
electrode4  27
electrode4  26
electrode4  86
electrode4  66
electrode4  64
electrode4  52
electrode4  27
electrode4  26
electrode1  206
electrode1  194
electrode1  182
electrode1  172
electrode1  169
electrode4  86
electrode4  66
electrode4  64
electrode4  52
electrode4  27
electrode4  26
electrode2  82
electrode2  78
electrode2  70
electrode2  58
electrode4  86
electrode4  66
electrode4  64
electrode4  52
electrode4  27
electrode4  26
electrode3  260
electrode3  176
electrode3  137





 Gesendet: Donnerstag, 25. Juli 2013 um 20:53 Uhr
 Von: Rui Barradas ruipbarra...@sapo.pt
 An: dennis1...@gmx.net
 Cc: r-help@r-project.org
 Betreff: Re: Aw: Re:  Re:  Re: [R] How to split two levels several times?

 Hello,

 I think the following does what you want. (I don't know if it makes much
 sense but it works.)



 lens - rle(as.character(XXX$electrode))$lengths
 m - length(lens) %/% 2
 idx - rep(1:m, sapply(1:m, function(.m) sum(lens[(2*.m - 1):(2*.m)])))
 if(length(lens) %% 2 != 0){
   idx - c(idx, rep(m + 1, lens[length(lens)]))
   sp_idx - split(idx, idx)
   n - length(sp_idx[[m]])
   if(n %/% 2  length(sp_idx[[m + 1]]))
   sp_idx[[m]][(n %/% 2 + 1):n] - sp_idx[[m + 1]][1]
   else
   sp_idx[[m]][(n - length(sp_idx[[m + 1]]) + 1):n] -  sp_idx[[m 
 + 1]][1]
   idx - unlist(sp_idx)
 }

 sp - split(XXX, idx)
 sp



 Rui Barradas

 Em 25-07-2013 11:40, dennis1...@gmx.net escreveu:
  Hi Rui
  once more thank you for your help. But the code does so far not solve the 
  problem because it still treats rows 17-22 (repeated appearance of 
  electrode1) as one single level. However as can be seen by rows 1-3 (or 
  rows 17-19 and rows 20-22) and the order of the length variable (row 1 = 
  5.7, row 2 = 6.3, row 3 = 6.2) electrode1 consists only of 3 rows. Maybe 
  that was not made absolutely clear by me. As described in my mail before if 
  by chance (or systematically) it happens to be that electrode1 appears 
  right after each other in the table then the code should split it “half 
  way”.
 
  So idx should not return
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4
 
  but instead 6 times number 4 at the end
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4
 
  Do you have any solution?
 
 
  Gesendet: Mittwoch, 24. Juli 2013 um 23:47 Uhr
  Von: Rui Barradas ruipbarra...@sapo.pt
  An: dennis1...@gmx.net
  Cc: r-help@r-project.org
  Betreff: Re: Aw: Re:  Re: [R] How to split two levels several times?
 
  Hello,
 
  As for the first question, note that in the case you describe, the
  resulting list of df's will not be a split of the original, there will
  be a duplication in the final 4-1 and 1-3. The following is a hack but
  will do it.
 
 
  lens - 

[R] a very urgunt and important question about R

2013-07-26 Thread mina orang
Dear all

I am doing a research in clinical psychology and I need to Generate a
block randomization for a clinical trial, with two treatments and
three therapists. I mean I must randomize the clients (all women) to
two kinds of treatment (called treatment as usual and narrative
exposure therapy) and three different therapists.
I tried to do this randomization by using R software and by using
stratification package but I didn't (and still can not) understand how
I should use R for this job.

Could you please help me and show me how I should use R for this kind
of randomization in a really easy way?

I really appreciate your help in advance.

Best
Mina

P.S. I just know little about statistics and software. Please teach me
in a really easy way.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help R

2013-07-26 Thread Mª Teresa Martinez Soriano
Hi to everyone, first of all thanks for this service, it is being very useful 
for me, thanks in 

advance.

I am new in R, so I suppose I could make really naive questions, I'm sorry.

I have to impute some missing values and I am trying to do it with VIM library 
trough Hot Deck 

imputation.

I writte:vmGUImenu(), and it opens a small window of: Visualization and 
Imputation of Missing 
Values 

and I select Imptation and Hot Deck and then one of the variables which I have 
to select is Select 

Variables to Build Domains.

I don't know which variables I have to select, I don't understand this. I have 
tried don't put 

anything and I get :

hotdeck(dataframe,variable=c(CRV.IE.2005,CRV.IE.2006,CRV.IE.2007,CRV.IE.2008,CRV.IE.2009,CR

V.IE.2010),ord_var=c(CRV.IE.2001,CRV.IE.2002,CRV.IE.2003,CRV.IE.2004,CRV.IE.2005,CRV.IE.20

06,CRV.IE.2007,CRV.IE.2008,CRV.IE.2009,CRV.IE.2010),domain_var=NULL,imp_suffix=_imp)

Mensajes de aviso perdido:

 In hotdeck(data, variable = vars, ord_var = sort, domain_var = domain,  

 Some NAs remained, maybe due to a too restrictive domain building!?

 In hotdeck(b, variable = c(CRV.IE.2005, CRV.IE.2006, CRV.IE.2007, Some 
NAs remained, maybe due 

to a too restrictive domain building!?


What should I  put in this variable??

Thanks in advance

Best regards

Teresa
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] n-dash instead of hyphens in plots

2013-07-26 Thread Anders Sand
Dear community,

How do I change the hyphens that appear when using '-' to n-dash in, for
example, x-axes?

Example code:
axis(1, c(1:2), c(1-4, 23-26), tck=.05)

Thanks
Anders

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] n-dash instead of hyphens in plots

2013-07-26 Thread Prof Brian Ripley

On 26/07/2013 09:36, Anders Sand wrote:

Dear community,

How do I change the hyphens that appear when using '-' to n-dash in, for
example, x-axes?

Example code:
axis(1, c(1:2), c(1-4, 23-26), tck=.05)


On what device on what platform?   The commonest answer would be

'if you want en-dash not hyphen, ask for en-dash and not hyphen'

en-dash is \u2013 where that is supported.




Thanks
Anders

[[alternative HTML version deleted]]

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This did ask for 'at a minimum' information and no HTML.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a very urgunt and important question about R

2013-07-26 Thread PIKAL Petr
Hi

I recommend to start with

library(fortunes)
fortune(surgery)

Anyway, beside Introduction to R you could go through 

CRAN Task View: Design of Experiments (DoE)  Analysis of Experimental Data

and specifically through

Experimental designs for clinical trials 
This task view only covers specific design of experiments packages; there may 
be some grey areas. Please, also consult the ClinicalTrials task view. 

*experiment contains tools for clinical experiments, e.g., a randomization 
tool, and it provides a few special analysis options for clinical trials. 
*Package gsDesign implements group sequential designs, 
*Package gsbDesign evaluates operating characteristics for group sequential 
Bayesian designs, 
*package asd implements adaptive sequential designs. 
*Package TEQR provides toxicity equivalence range designs (Blanchard and 
Longmate 2010) for phase I clinical trials. 
*The DoseFinding package provides functions for the design and analysis of 
dose-finding experiments (for example pharmaceutical Phase II clinical trials); 
it combines the facilities of the MCPMod package (maintenance discontinued; 
described in Bornkamp, Pinheiro and Bretz 2009) with a special type of optimal 
designs for dose finding situations (MED-optimal designs, or D-optimal designs, 
or a mixture of both; cf., Dette et al. 2008)

Which shall provide you with some insight.

Regards
Petr


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of mina orang
 Sent: Friday, July 26, 2013 10:26 AM
 To: r-help@r-project.org
 Subject: [R] a very urgunt and important question about R
 
 Dear all
 
 I am doing a research in clinical psychology and I need to Generate a
 block randomization for a clinical trial, with two treatments and three
 therapists. I mean I must randomize the clients (all women) to two
 kinds of treatment (called treatment as usual and narrative exposure
 therapy) and three different therapists.
 I tried to do this randomization by using R software and by using
 stratification package but I didn't (and still can not) understand how
 I should use R for this job.
 
 Could you please help me and show me how I should use R for this kind
 of randomization in a really easy way?
 
 I really appreciate your help in advance.
 
 Best
 Mina
 
 P.S. I just know little about statistics and software. Please teach me
 in a really easy way.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a very urgunt and important question about R

2013-07-26 Thread arun
Hi,
You may try:
library(psych)
?block.random()

Also, look this link
http://personality-project.org/revelle/syllabi/205/block.randomization.pdf
A.K.




- Original Message -
From: mina orang minaor...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Friday, July 26, 2013 4:26 AM
Subject: [R] a very urgunt and important question about R

Dear all

I am doing a research in clinical psychology and I need to Generate a
block randomization for a clinical trial, with two treatments and
three therapists. I mean I must randomize the clients (all women) to
two kinds of treatment (called treatment as usual and narrative
exposure therapy) and three different therapists.
I tried to do this randomization by using R software and by using
stratification package but I didn't (and still can not) understand how
I should use R for this job.

Could you please help me and show me how I should use R for this kind
of randomization in a really easy way?

I really appreciate your help in advance.

Best
Mina

P.S. I just know little about statistics and software. Please teach me
in a really easy way.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Jul 26, 2013; 12:34am

2013-07-26 Thread Frank Harrell

Thanks Rich and Jim and apologies for omitting the line

x - c(285, 43.75, 94, 150, 214, 375, 270, 350, 41.5, 210, 30, 37.6,
281, 101, 210)

But the fundamental problem remains that vertical spacing is not correct 
unless I waste a lot of image space at the top.


Frank


--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated measures Cox regression ??coxph??

2013-07-26 Thread Terry Therneau

Two choices.
If this were a linear model, do you like the GEE approach or a mixed effects approach?  
Assume that subject is a variable containing a per-subject identifier.


GEE approach: add + cluster(subject) to the model statement in coxph
Mixed models approach: Add  + (1|subject) to the model statment in coxme.

When only a very few subjects have multiple events, the mixed model (random effect) 
approach may not be reliable, however.  Multiple events per group are the fuel for 
estimation of the variance of the random effect, and with few of these the profile 
likelihood of the random effect will be very flat.  You can get esssentially a random 
estimate of the variance of the subject effect.  I'm still getting my arms around this 
issue, and it has taken me a long time.


Frailty is an alternate label for random effects when all we have is a random 
intercept.  Multiple labels for the same idea adds confusion, but nothing else.


Terry Therneau

On 07/25/2013 08:14 PM, Marc Schwartz wrote:

On Jul 25, 2013, at 4:45 PM, David Winsemiusdwinsem...@comcast.net  wrote:


On Jul 25, 2013, at 12:27 PM, Marc Schwartz wrote:


On Jul 25, 2013, at 2:11 PM, John Sorkinjsor...@grecc.umaryland.edu  wrote:


Colleagues,
Is there any R package that will allow one to perform a repeated measures Cox 
Proportional Hazards regression? I don't think coxph is set up to handle this 
type of problem, but I would be happy to know that I am not correct.
I am doing a study of time to hip joint replacement. As each person has two 
hips, a given person can appear in the dataset twice, once for the left hip and 
once for the right hip, and I need to account for the correlation of data from 
a single individual.
Thank you,
John



John,

See Terry's 'coxme' package:

http://cran.r-project.org/web/packages/coxme/index.html


When I looked over the description of coxme, I was concerned it was not really 
designed with this in mind. Looking at Therneau and Grambsch, I thought section 
8.4.2 in the 'Multiple Events per Subject' Chapter fit the analysis question 
well. There they compared the use of coxph( ...+cluster(ID),,...)  withcoxph( 
...+strata(ID),,...). Unfortunately I could not tell for sure which one was 
being described as superio but I think it was the cluster() alternative. I seem 
to remember there are discussions in the archives.


David,

I think that you raise a good point. The example in the book (I had to wait to 
get home to read it) is potentially different however, in that the subject's 
eye's were randomized to treatment or control, which would seem to suggest 
comparable baseline characteristics for each pair of eyes, as well as an active 
intervention on one side where a difference in treatment effect between each 
eye is being analyzed.

It is not clear from John's description above if there is one hip that will be 
treated versus one as a control and whether the extent of disease at baseline 
is similar in each pair of hips. Presumably the timing of hip replacements will 
be staggered at some level, even if there is comparable disease, simply due to 
post-op recovery time and surgical risk. In cases where the disease between 
each hip is materially different, that would be another factor to consider, 
however I would defer to orthopaedic physicians/surgeons from a subject matter 
expertise consideration. It is possible that the bilateral hip replacement data 
might be more of a parallel to bilateral breast cancer data, if each breast 
were to be tracked separately.

I have cc'd Terry here, hoping that he might jump in and offer some insights 
into the pros/cons of using coxme versus coxph with either a cluster or strata 
based approach, or perhaps even a frailty based approach as in 9.4.1 in the 
book.

Regards,

Marc



--
David.

You also might find the following of interest:

http://bjo.bmj.com/content/71/9/645.full.pdf

http://www.ncbi.nlm.nih.gov/pubmed/6885

http://www.ncbi.nlm.nih.gov/pubmed/22078901



Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to split two levels several times?

2013-07-26 Thread arun
Hi Dennis,
No problem.

As I mentioned, I was under the understanding that electrode1 occurs only in 
multiples of 3.  I think your earlier post sounded like that.  May be I was 
misunderstood.  So, my solution was based on that.  In the below table, I am 
not sure, how you wanted to split.  For ex. it is not clear, whether rows 1-5 
(electrode1) are only in the first list element or should it include  the next 
electrode electrode2 also?  Could you also show the expected output?

dat$Len
# [1]  5  4  5  3  5  6  4  5  4  3  4  6  3  5  3  4  3 12  5  6  4  6  3
 as.character(dat$Val)
# [1] electrode1 electrode2 electrode1 electrode3 electrode1
 #[6] electrode4 electrode2 electrode1 electrode2 electrode3
#[11] electrode2 electrode4 electrode3 electrode1 electrode3
#[16] electrode2 electrode3 electrode4 electrode1 electrode4
#[21] electrode2 electrode4 electrode3
A.K.



Hi Rui  Arun, 
really thanks for investing so much time to deal with this problem! 
The code works now for this specific example. However it is not 
generally robust for slightly different situations. For instance it 
cannot correctly handle a slight variation of the table where I have 
again 4 types of electrodes of certain lengths. Electrode4 exists only 6
 times. At the transition of the combinations 3-4 and 4-1 there are 12 
times electrode4 which stick together in the output $`9`. This leads to 
wrong splittings thereafter. Sorry for asking such tricky questions. 

New table XXX 

electrode   length 
electrode1  206 
electrode1  194 
electrode1  182 
electrode1  172 
electrode1  169 
electrode2  82 
electrode2  78 
electrode2  70 
electrode2  58 
electrode1  206 
electrode1  194 
electrode1  182 
electrode1  172 
electrode1  169 
electrode3  260 
electrode3  176 
electrode3  137 
electrode1  206 
electrode1  194 
electrode1  182 
electrode1  172 
electrode1  169 
electrode4  86 
electrode4  66 
electrode4  64 
electrode4  52 
electrode4  27 
electrode4  26 
electrode2  82 
electrode2  78 
electrode2  70 
electrode2  58 
electrode1  206 
electrode1  194 
electrode1  182 
electrode1  172 
electrode1  169 
electrode2  82 
electrode2  78 
electrode2  70 
electrode2  58 
electrode3  260 
electrode3  176 
electrode3  137 
electrode2  82 
electrode2  78 
electrode2  70 
electrode2  58 
electrode4  86 
electrode4  66 
electrode4  64 
electrode4  52 
electrode4  27 
electrode4  26 
electrode3  260 
electrode3  176 
electrode3  137 
electrode1  206 
electrode1  194 
electrode1  182 
electrode1  172 
electrode1  169 
electrode3  260 
electrode3  176 
electrode3  137 
electrode2  82 
electrode2  78 
electrode2  70 
electrode2  58 
electrode3  260 
electrode3  176 
electrode3  137 
electrode4  86 
electrode4  66 
electrode4  64 
electrode4  52 
electrode4  27 
electrode4  26 
electrode4  86 
electrode4  66 
electrode4  64 
electrode4  52 
electrode4  27 
electrode4  26 
electrode1  206 
electrode1  194 
electrode1  182 
electrode1  172 
electrode1  169 
electrode4  86 
electrode4  66 
electrode4  64 
electrode4  52 
electrode4  27 
electrode4  26 
electrode2  82 
electrode2  78 
electrode2  70 
electrode2  58 
electrode4  86 
electrode4  66 
electrode4  64 
electrode4  52 
electrode4  27 
electrode4  26 
electrode3  260 
electrode3  176 
electrode3  137 



- Original Message -
From: arun smartpink...@yahoo.com
To: dennis1...@gmx.net dennis1...@gmx.net
Cc: 
Sent: Friday, July 26, 2013 1:05 AM
Subject: Re: [R] How to split two levels several times?

Just to add:
I assumed that electrode1 would be found in multiples of 3.  I could be 
wrong.  



- Original Message -
From: arun smartpink...@yahoo.com
To: Rui Barradas ruipbarra...@sapo.pt
Cc: dennis1...@gmx.net dennis1...@gmx.net; r-help@r-project.org 
r-help@r-project.org
Sent: Friday, July 26, 2013 12:53 AM
Subject: Re: [R] How to split two levels several times?



May be this also helps:
XXX: dataset
rl-rle(as.character(XXX$electrode))
 
dat-do.call(rbind,lapply(seq_along(rl$lengths),function(i){x1-if(rl$values[i]==electrode1
  (rl$lengths[i]%/%31)) rep(3,rl$lengths[i]%/%3) else 
rl$lengths[i];data.frame(Len=x1,Val=rl$values[i])}))
 lst1-split(cumsum(dat[,1]),((seq_along(dat[,1])-1)%/%2)+1)
vec1-sapply(lst1,max)
vec2-c(1,vec1[-length(vec1)]+1)
res-  lapply(seq_along(lst1),function(i) {x1-lst1[[i]]; 
XXX[seq(vec2[i],max(x1)),]})
 res
#[[1]]
 #  electrode length
#1 electrode1    5.7
#2 electrode1    6.3
#3 electrode1    6.2
#4 electrode2   11.4
#5 electrode2    9.7
#
#[[2]]
 #   electrode length
#6  electrode3   14.2
#7  electrode3   14.8
#8  electrode3   

Re: [R] Maintaining data order in factanal with missing data

2013-07-26 Thread PIKAL Petr
Hi

Well, the function init.dfs does nothing as all data frames created inside it 
does not propagate to global environment and there is nothing what the function 
returns.

Tha last line (when used outside a function) gives warnings but there is no 
sign of error.

When 

 head(ab.1.df)
  dmid   g5oab2  g53  g54  g55   g5ovb1
11 1.418932 1.805227 2.791152 3.624116 3.425586
22 2.293907 1.187830 1.611237 1.748526 3.816533
33 2.836536 2.679523 1.279639 2.674986 2.452395
44 1.872259 3.278359 1.785872 2.458315 1.146480
55 1.467195 1.180747 3.564127 3.007682 2.109506
66 3.098512 3.151974 3.969379 3.750571 1.497358
 head(ab.2.df)
  dmid   w2oab3  w22  w23  w24   w2ovb1
11 4.831362 5.522764 7.809366 6.969172 7.398385
22 6.706346 4.101742 1.434697 5.266775 5.357641
33 3.653806 2.666885 1.209326 5.125556 4.963374
44 7.221255 7.649152 6.540398 6.648506 2.576081
55 1.848023 5.044314 2.761881 3.307220 1.454234
66 7.606429 4.911766 2.034813 2.638573 2.818834
 head(ab.3.df)
  dmid   w3oab3   w3oab4   w3oab7   w3oab8   w3ovb1
11 5.835609 6.108220 6.587721 2.451461 2.785467
22 4.973198 1.196815 6.388056 1.110877 4.226463
33 3.800367 6.697287 5.235345 6.666829 6.319073
44 1.093141 1.43 2.269252 3.194978 4.916342
55 1.975060 7.204516 4.825435 1.775874 3.484027
66 3.273361 2.243805 5.326547 5.720892 6.118723


 str(ab.1.fa)
List of 2
 $ rescaled.scores: Named num [1:154] 3.43 3.83 2.43 1.1 2.08 ...
  ..- attr(*, names)= chr [1:154] 1 2 3 4 ...
 $ factor.loadings: Named num [1:5] -0.0106 -0.0227 -0.1093 -0.0912 0.9975
  ..- attr(*, names)= chr [1:5] g5oab2 g53 g54 g55 ...
 str(ab.2.fa)
List of 2
 $ rescaled.scores: Named num [1:154] 6.34 5.24 5.3 1.91 2.16 ...
  ..- attr(*, names)= chr [1:154] 1 2 3 4 ...
 $ factor.loadings: Named num [1:5] -0.2042 0.0063 -0.2287 -0.0119 0.7138
  ..- attr(*, names)= chr [1:5] w2oab3 w22 w23 w24 ...
 str(ab.3.fa)
List of 2
 $ rescaled.scores: Named num [1:154] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
...
  ..- attr(*, names)= chr [1:154] 1 2 3 4 ...
 $ factor.loadings: Named num [1:5] -0.1172 0.0128 -0.0968 0.106 0.9975
  ..- attr(*, names)= chr [1:5] w3oab3 w3oab4 w3oab7 w3oab8 ...

Anyway I have no idea what you consider wrong?

Regards
Petr



 -Original Message-
 From: Justin Delahunty [mailto:a...@genius.net.au]
 Sent: Friday, July 26, 2013 2:22 PM
 To: PIKAL Petr; 'Justin Delahunty'; r-help@r-project.org
 Subject: RE: [R] Maintaining data order in factanal with missing data
 
 Hi Petr,
 
 Thanks for the quick response. Unfortunately I cannot share the data I
 am working with, however please find attached a suitable R workspace
 with generated data. It has the appropriate variable names, only the
 data has been changed.
 
 The last function in the list (init.dfs()) I call to subset the overall
 data set into the three waves, then conduct the factor analysis on each
 (1 factor CFA); it's just in a function to ease re-typing in a new
 workspace.
 
 
 Thanks,
 
 Justin
 
 -Original Message-
 From: PIKAL Petr [mailto:petr.pi...@precheza.cz]
 Sent: Friday, 26 July 2013 7:35 PM
 To: Justin Delahunty; r-help@r-project.org
 Subject: RE: [R] Maintaining data order in factanal with missing data
 
 Hi
 
 You provided functions, so far so good. But without data it would be
 quite difficult to understand what the functions do and where could be
 the issue.
 
 I suspect combination of complete cases selection together with subset
 and factor behaviour. But I can be completely out of target too.
 
 Petr
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of s00123...@myacu.edu.au
  Sent: Friday, July 26, 2013 9:35 AM
  To: r-help@r-project.org
  Subject: [R] Maintaining data order in factanal with missing data
 
  Hi,
 
 
 
  I'm new to R, so sorry if this is a simple answer. I'm currently
  trying to collapse some ordinal variables into a composite; the
  program ideally should take a data frame as input, perform a factor
  analysis, compute factor scores, sds, etc., and return the rescaled
  scores and loadings. The difficulty I'm having is that my data set
  contains a number of NA, which I am excluding from the analysis using
  complete.cases(), and thus the incomplete cases are skipped. These
  functions are for a longitudinal data set with repeated waves of
 data,
  so the final rescaled scores from each wave need to be saved as
  variables grouped by a unique ID (DMID). The functions I'm trying to
  implement are as follows:
 
 
 
  weighted.sd-function(x,w){
 
  sum.w-sum(w)
 
  sum.w2-sum(w^2)
 
  mean.w-sum(x*w)/sum(w)
 
 
  x.sd.w-sqrt((sum.w/(sum.w^2-sum.w2))*sum(w*(x-mean.w)^2))
 
  return(x.sd.w)
 
  }
 
 
 
  re.scale-function(f.scores, raw.data, 

[R] modeest with non-numeric data?

2013-07-26 Thread Tom Hopper
Hello,

I have recently discovered the modeest library, and am trying to understand
how to use it with non-numeric data (e.g. determining the most common last
name, or analysing customer demographics ​by zip code).

I have the mlv() function working for numeric (double and integer) data,
but it throws either an error or a warning and produces unexpected output
with character data. Any help is appreciated.

A simple example:


 my.rand.letters - sample(letters, size=100, replace=TRUE)

 mlv(my.rand.letters, mode=C(discrete))Error in match.arg(x, .distribList) : 
 'arg' must be of length 1In addition: There were 21 warnings (use warnings() 
 to see them)



 mlv(as.factor(my.rand.letters))Mode (most frequent value): NA NA
Bickel's modal skewness: -2
Call: mlv.factor(x = as.factor(my.rand.letters)) Warning message:In
discrete(x, ...) : NAs introduced by coercion



TIA,

Tom

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to split two levels several times?

2013-07-26 Thread arun


Hi Dennis,
I guess in this case, instead of Eletrode1 occuring 3 times, it is 
Electrode4 exists only 6 times.  If that is the situation:
just change:
XXX: data
rl-rle(as.character(XXX$electrode))
 
dat-do.call(rbind,lapply(seq_along(rl$lengths),function(i){x1-if(rl$values[i]==electrode4
  (rl$lengths[i]%/%61)) rep(6,rl$lengths[i]%/%6) else 
rl$lengths[i];data.frame(Len=x1,Val=rl$values[i])}))
 lst1-split(cumsum(dat[,1]),((seq_along(dat[,1])-1)%/%2)+1)
vec1-sapply(lst1,max)
vec2-c(1,vec1[-length(vec1)]+1)
res-  lapply(seq_along(lst1),function(i) {x1-lst1[[i]]; 
XXX[seq(vec2[i],max(x1)),]})
res
[[1]]
   electrode length
1 electrode1    206
2 electrode1    194
3 electrode1    182
4 electrode1    172
5 electrode1    169
6 electrode2 82
7 electrode2 78
8 electrode2 70
9 electrode2 58

[[2]]
    electrode length
10 electrode1    206
11 electrode1    194
12 electrode1    182
13 electrode1    172
14 electrode1    169
15 electrode3    260
16 electrode3    176
17 electrode3    137

[[3]]
    electrode length
18 electrode1    206
19 electrode1    194
20 electrode1    182
21 electrode1    172
22 electrode1    169
23 electrode4 86
24 electrode4 66
25 electrode4 64
26 electrode4 52
27 electrode4 27
28 electrode4 26

[[4]]
    electrode length
29 electrode2 82
30 electrode2 78
31 electrode2 70
32 electrode2 58
33 electrode1    206
34 electrode1    194
35 electrode1    182
36 electrode1    172
37 electrode1    169

[[5]]
    electrode length
38 electrode2 82
39 electrode2 78
40 electrode2 70
41 electrode2 58
42 electrode3    260
43 electrode3    176
44 electrode3    137

[[6]]
    electrode length
45 electrode2 82
46 electrode2 78
47 electrode2 70
48 electrode2 58
49 electrode4 86
50 electrode4 66
51 electrode4 64
52 electrode4 52
53 electrode4 27
54 electrode4 26

[[7]]
    electrode length
55 electrode3    260
56 electrode3    176
57 electrode3    137
58 electrode1    206
59 electrode1    194
60 electrode1    182
61 electrode1    172
62 electrode1    169

[[8]]
    electrode length
63 electrode3    260
64 electrode3    176
65 electrode3    137
66 electrode2 82
67 electrode2 78
68 electrode2 70
69 electrode2 58

[[9]]
    electrode length
70 electrode3    260
71 electrode3    176
72 electrode3    137
73 electrode4 86
74 electrode4 66
75 electrode4 64
76 electrode4 52
77 electrode4 27
78 electrode4 26

[[10]]
    electrode length
79 electrode4 86
80 electrode4 66
81 electrode4 64
82 electrode4 52
83 electrode4 27
84 electrode4 26
85 electrode1    206
86 electrode1    194
87 electrode1    182
88 electrode1    172
89 electrode1    169

[[11]]
    electrode length
90 electrode4 86
91 electrode4 66
92 electrode4 64
93 electrode4 52
94 electrode4 27
95 electrode4 26
96 electrode2 82
97 electrode2 78
98 electrode2 70
99 electrode2 58

[[12]]
 electrode length
100 electrode4 86
101 electrode4 66
102 electrode4 64
103 electrode4 52
104 electrode4 27
105 electrode4 26
106 electrode3    260
107 electrode3    176
108 electrode3    137


A.K.




- Original Message -
From: dennis1...@gmx.net dennis1...@gmx.net
To: Rui Barradas ruipbarra...@sapo.pt; r-help@r-project.org
Cc: 
Sent: Friday, July 26, 2013 6:07 AM
Subject: Re: [R] How to split two levels several times?

Hi Rui  Arun,
really thanks for investing so much time to deal with this problem! The code 
works now for this specific example. However it is not generally robust for 
slightly different situations. For instance it cannot correctly handle a slight 
variation of the table where I have again 4 types of electrodes of certain 
lengths. Electrode4 exists only 6 times. At the transition of the combinations 
3-4 and 4-1 there are 12 times electrode4 which stick together in the output 
$`9`. This leads to wrong splittings thereafter. Sorry for asking such tricky 
questions.

New table XXX

electrode    length
electrode1    206
electrode1    194
electrode1    182
electrode1    172
electrode1    169
electrode2    82
electrode2    78
electrode2    70
electrode2    58
electrode1    206
electrode1    194
electrode1    182
electrode1    172
electrode1    169
electrode3    260
electrode3    176
electrode3    137
electrode1    206
electrode1    194
electrode1    182
electrode1    172
electrode1    169
electrode4    86
electrode4    66
electrode4    64
electrode4    52
electrode4    27
electrode4    26
electrode2    82
electrode2    78
electrode2    70
electrode2    58
electrode1    206
electrode1    194
electrode1    182
electrode1    172
electrode1    169
electrode2    82
electrode2    78
electrode2    70
electrode2    58
electrode3    260
electrode3    176
electrode3    137
electrode2    82
electrode2    78
electrode2    70
electrode2    58
electrode4    86
electrode4    66
electrode4    64
electrode4    52
electrode4    27
electrode4 

Re: [R] How to split two levels several times?

2013-07-26 Thread arun
It would be better to wrap it in a function.

fun1- function(x,colName,N,value){
rl- rle(as.character(x[,colName]))
dat-do.call(rbind,lapply(seq_along(rl$lengths),function(i){x1-if(rl$values[i]==value
  (rl$lengths[i]%/%N1)) rep(N,rl$lengths[i]%/%N) else 
rl$lengths[i];data.frame(Len=x1,Val=rl$values[i])}))
lst1-split(cumsum(dat[,1]),((seq_along(dat[,1])-1)%/%2)+1)
vec1-sapply(lst1,max)
vec2-c(1,vec1[-length(vec1)]+1)
res-  lapply(seq_along(lst1),function(i) {x1-lst1[[i]]; 
x[seq(vec2[i],max(x1)),]})
res
}

fun1(XXX,electrode,6,electrode4)
#Using previous dataset XXX, XXX1, XXX2
fun1(XXX,electrode,3,electrode1)

fun1(XXX1,electrode,3,electrode1)

fun1(XXX2,electrode,3,electrode1)
A.K.

- Original Message -
From: arun smartpink...@yahoo.com
To: dennis1...@gmx.net dennis1...@gmx.net
Cc: R help r-help@r-project.org; Rui Barradas ruipbarra...@sapo.pt
Sent: Friday, July 26, 2013 9:26 AM
Subject: Re: [R] How to split two levels several times?



Hi Dennis,
I guess in this case, instead of Eletrode1 occuring 3 times, it is 
Electrode4 exists only 6 times.  If that is the situation:
just change:
XXX: data
rl-rle(as.character(XXX$electrode))
 
dat-do.call(rbind,lapply(seq_along(rl$lengths),function(i){x1-if(rl$values[i]==electrode4
  (rl$lengths[i]%/%61)) rep(6,rl$lengths[i]%/%6) else 
rl$lengths[i];data.frame(Len=x1,Val=rl$values[i])}))
 lst1-split(cumsum(dat[,1]),((seq_along(dat[,1])-1)%/%2)+1)
vec1-sapply(lst1,max)
vec2-c(1,vec1[-length(vec1)]+1)
res-  lapply(seq_along(lst1),function(i) {x1-lst1[[i]]; 
XXX[seq(vec2[i],max(x1)),]})
res
[[1]]
   electrode length
1 electrode1    206
2 electrode1    194
3 electrode1    182
4 electrode1    172
5 electrode1    169
6 electrode2 82
7 electrode2 78
8 electrode2 70
9 electrode2 58

[[2]]
    electrode length
10 electrode1    206
11 electrode1    194
12 electrode1    182
13 electrode1    172
14 electrode1    169
15 electrode3    260
16 electrode3    176
17 electrode3    137

[[3]]
    electrode length
18 electrode1    206
19 electrode1    194
20 electrode1    182
21 electrode1    172
22 electrode1    169
23 electrode4 86
24 electrode4 66
25 electrode4 64
26 electrode4 52
27 electrode4 27
28 electrode4 26

[[4]]
    electrode length
29 electrode2 82
30 electrode2 78
31 electrode2 70
32 electrode2 58
33 electrode1    206
34 electrode1    194
35 electrode1    182
36 electrode1    172
37 electrode1    169

[[5]]
    electrode length
38 electrode2 82
39 electrode2 78
40 electrode2 70
41 electrode2 58
42 electrode3    260
43 electrode3    176
44 electrode3    137

[[6]]
    electrode length
45 electrode2 82
46 electrode2 78
47 electrode2 70
48 electrode2 58
49 electrode4 86
50 electrode4 66
51 electrode4 64
52 electrode4 52
53 electrode4 27
54 electrode4 26

[[7]]
    electrode length
55 electrode3    260
56 electrode3    176
57 electrode3    137
58 electrode1    206
59 electrode1    194
60 electrode1    182
61 electrode1    172
62 electrode1    169

[[8]]
    electrode length
63 electrode3    260
64 electrode3    176
65 electrode3    137
66 electrode2 82
67 electrode2 78
68 electrode2 70
69 electrode2 58

[[9]]
    electrode length
70 electrode3    260
71 electrode3    176
72 electrode3    137
73 electrode4 86
74 electrode4 66
75 electrode4 64
76 electrode4 52
77 electrode4 27
78 electrode4 26

[[10]]
    electrode length
79 electrode4 86
80 electrode4 66
81 electrode4 64
82 electrode4 52
83 electrode4 27
84 electrode4 26
85 electrode1    206
86 electrode1    194
87 electrode1    182
88 electrode1    172
89 electrode1    169

[[11]]
    electrode length
90 electrode4 86
91 electrode4 66
92 electrode4 64
93 electrode4 52
94 electrode4 27
95 electrode4 26
96 electrode2 82
97 electrode2 78
98 electrode2 70
99 electrode2 58

[[12]]
 electrode length
100 electrode4 86
101 electrode4 66
102 electrode4 64
103 electrode4 52
104 electrode4 27
105 electrode4 26
106 electrode3    260
107 electrode3    176
108 electrode3    137


A.K.




- Original Message -
From: dennis1...@gmx.net dennis1...@gmx.net
To: Rui Barradas ruipbarra...@sapo.pt; r-help@r-project.org
Cc: 
Sent: Friday, July 26, 2013 6:07 AM
Subject: Re: [R] How to split two levels several times?

Hi Rui  Arun,
really thanks for investing so much time to deal with this problem! The code 
works now for this specific example. However it is not generally robust for 
slightly different situations. For instance it cannot correctly handle a slight 
variation of the table where I have again 4 types of electrodes of certain 
lengths. Electrode4 exists only 6 times. At the transition of the combinations 
3-4 and 4-1 there are 12 times electrode4 which stick together in the output 
$`9`. This leads to wrong splittings thereafter. Sorry for asking such tricky 
questions.

New table 

Re: [R] help on carrying forward several vectors of rownames

2013-07-26 Thread John Kane
Can you supply some code and data? At the moment the question in not clear, or 
not expressed in R terminology.  Maybe I'm just not awake yet but 
I tried to insert these vectors as a list, e.g. row.names=c(1,4), from the 
excel file
does not seem to make any sense at all probably because 'list' has a special 
meaning in R and I don't think that is what you mean. if it is, my appologies.


Perhaps have a look at
https://github.com/hadley/devtools/wiki/Reproducibility
 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

John Kane
Kingston ON Canada


 -Original Message-
 From: theobrom...@gmail.com
 Sent: Thu, 25 Jul 2013 22:00:32 -0700
 To: r-help@r-project.org
 Subject: [R] help on carrying forward several vectors of rownames
 
 Dear reader,
 I'm trying to use multiple vectors as rownames for the read.table
 function.
 I tried to insert these vectors as a list, e.g. row.names=c(1,4), from
 the
 excel file. I tried other ways, as if the argument only took continuous
 tab
 separated
 columns, e.g. row.names=c(1:4). But, this also did not work.
 Is there a solution to this problem?
 Regards,
 --
 Franklin
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Maintaining data order in factanal with missing data

2013-07-26 Thread Justin Delahunty
Hi Petr,

Thanks for the quick response. Unfortunately I cannot share the data I am
working with, however please find attached a suitable R workspace with
generated data. It has the appropriate variable names, only the data has
been changed.

The last function in the list (init.dfs()) I call to subset the overall data
set into the three waves, then conduct the factor analysis on each (1 factor
CFA); it's just in a function to ease re-typing in a new workspace.


Thanks,

Justin

-Original Message-
From: PIKAL Petr [mailto:petr.pi...@precheza.cz] 
Sent: Friday, 26 July 2013 7:35 PM
To: Justin Delahunty; r-help@r-project.org
Subject: RE: [R] Maintaining data order in factanal with missing data

Hi

You provided functions, so far so good. But without data it would be quite
difficult to understand what the functions do and where could be the issue.

I suspect combination of complete cases selection together with subset and
factor behaviour. But I can be completely out of target too.

Petr

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- 
 project.org] On Behalf Of s00123...@myacu.edu.au
 Sent: Friday, July 26, 2013 9:35 AM
 To: r-help@r-project.org
 Subject: [R] Maintaining data order in factanal with missing data
 
 Hi,
 
 
 
 I'm new to R, so sorry if this is a simple answer. I'm currently 
 trying to collapse some ordinal variables into a composite; the 
 program ideally should take a data frame as input, perform a factor 
 analysis, compute factor scores, sds, etc., and return the rescaled 
 scores and loadings. The difficulty I'm having is that my data set 
 contains a number of NA, which I am excluding from the analysis using 
 complete.cases(), and thus the incomplete cases are skipped. These 
 functions are for a longitudinal data set with repeated waves of data, 
 so the final rescaled scores from each wave need to be saved as 
 variables grouped by a unique ID (DMID). The functions I'm trying to 
 implement are as follows:
 
 
 
 weighted.sd-function(x,w){
 
 sum.w-sum(w)
 
 sum.w2-sum(w^2)
 
 mean.w-sum(x*w)/sum(w)
 
 
 x.sd.w-sqrt((sum.w/(sum.w^2-sum.w2))*sum(w*(x-mean.w)^2))
 
 return(x.sd.w)
 
 }
 
 
 
 re.scale-function(f.scores, raw.data, loadings){
 
 
 fz.scores-(f.scores+mean(f.scores))/(sd(f.scores))
 
 
 means-apply(raw.data,1,weighted.mean,w=loadings)
 
 
 sds-apply(raw.data,1,weighted.sd,w=loadings)
 
 grand.mean-mean(means)
 
 grand.sd-mean(sds)
 
 
 final.scores-((fz.scores*grand.sd)+grand.mean)
 
 return(final.scores)
 
 }
 
 
 
 get.scores-function(data){
 
 
 fact-
 factanal(data[complete.cases(data),],factors=1,scores=regression)
 
 f.scores-fact$scores[,1]
 
 f.loads-fact$loadings[,1]
 
 rescaled.scores-re.scale(f.scores,
 data[complete.cases(data),], f.loads)
 
 output.list-list(rescaled.scores,
 f.loads)
 
 names(output.list)- 
 c(rescaled.scores,
 factor.loadings)
 
 return(output.list)
 
 }
 
 
 
 init.dfs-function(){
 
 
 ab.1.df-subset(ab.df,,select=c(dmid,g5oab2:g5ovb1))
 
 
 ab.2.df-subset(ab.df,,select=c(dmid,w2oab3:w2ovb1))
 
 ab.3.df-subset(ab.df,,select=c(dmid,
 w3oab3, w3oab4, w3oab7, w3oab8, w3ovb1))
 
 
 
 ab.1.fa-get.scores(ab.1.df[-1])
 
 ab.2.fa-get.scores(ab.2.df[-1])
 
 ab.3.fa-get.scores(ab.3.df[-1])
 
 
 }
 
 
 
 Thanks for your help,
 
 
 
 Justin
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html and provide commented, minimal, self-contained, 
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Maintaining data order in factanal with missing data

2013-07-26 Thread Justin Delahunty
Hi Petr,

So sorry, I accidentally attached the complete data set rather than the one
with missing values. I've attached the correct file to this email. RE:
init.dfs() being local, I hadn't even thought of that. I've been away from
OOP for close to 15 years now, so it might be time to revise!

The problem I have is that with missing values the list of factor scores
returned (ab.w1.fa$factor.scores) does not map onto the originating data
frame (ab.w1.df) as it no longer includes the cases which had missing
values. So while the original data set for ab.w1.df contains 154 ordered
cases, the factor analysis contains only 150.

I am seeking a way to map the values derived from the factor analysis
(ab.w1.fa$factor.scores) back to their original ordered position, so that
these factor score variables may be merged back into the master data frame
(ab.df). A unique ID for each case is available ($dmid) which I had thought
to use when merging the new variables, however I don't know how to implement
this.


Thanks for your help,

Justin


-Original Message-
From: PIKAL Petr [mailto:petr.pi...@precheza.cz] 
Sent: Friday, 26 July 2013 10:59 PM
To: Justin Delahunty; Justin Delahunty; r-help@r-project.org
Subject: RE: [R] Maintaining data order in factanal with missing data

Hi

Well, the function init.dfs does nothing as all data frames created inside
it does not propagate to global environment and there is nothing what the
function returns.

Tha last line (when used outside a function) gives warnings but there is no
sign of error.

When 

 head(ab.1.df)
  dmid   g5oab2  g53  g54  g55   g5ovb1
11 1.418932 1.805227 2.791152 3.624116 3.425586
22 2.293907 1.187830 1.611237 1.748526 3.816533
33 2.836536 2.679523 1.279639 2.674986 2.452395
44 1.872259 3.278359 1.785872 2.458315 1.146480
55 1.467195 1.180747 3.564127 3.007682 2.109506
66 3.098512 3.151974 3.969379 3.750571 1.497358
 head(ab.2.df)
  dmid   w2oab3  w22  w23  w24   w2ovb1
11 4.831362 5.522764 7.809366 6.969172 7.398385
22 6.706346 4.101742 1.434697 5.266775 5.357641
33 3.653806 2.666885 1.209326 5.125556 4.963374
44 7.221255 7.649152 6.540398 6.648506 2.576081
55 1.848023 5.044314 2.761881 3.307220 1.454234
66 7.606429 4.911766 2.034813 2.638573 2.818834
 head(ab.3.df)
  dmid   w3oab3   w3oab4   w3oab7   w3oab8   w3ovb1
11 5.835609 6.108220 6.587721 2.451461 2.785467
22 4.973198 1.196815 6.388056 1.110877 4.226463
33 3.800367 6.697287 5.235345 6.666829 6.319073
44 1.093141 1.43 2.269252 3.194978 4.916342
55 1.975060 7.204516 4.825435 1.775874 3.484027
66 3.273361 2.243805 5.326547 5.720892 6.118723


 str(ab.1.fa)
List of 2
 $ rescaled.scores: Named num [1:154] 3.43 3.83 2.43 1.1 2.08 ...
  ..- attr(*, names)= chr [1:154] 1 2 3 4 ...
 $ factor.loadings: Named num [1:5] -0.0106 -0.0227 -0.1093 -0.0912 0.9975
  ..- attr(*, names)= chr [1:5] g5oab2 g53 g54 g55 ...
 str(ab.2.fa)
List of 2
 $ rescaled.scores: Named num [1:154] 6.34 5.24 5.3 1.91 2.16 ...
  ..- attr(*, names)= chr [1:154] 1 2 3 4 ...
 $ factor.loadings: Named num [1:5] -0.2042 0.0063 -0.2287 -0.0119 0.7138
  ..- attr(*, names)= chr [1:5] w2oab3 w22 w23 w24 ...
 str(ab.3.fa)
List of 2
 $ rescaled.scores: Named num [1:154] NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN ...
  ..- attr(*, names)= chr [1:154] 1 2 3 4 ...
 $ factor.loadings: Named num [1:5] -0.1172 0.0128 -0.0968 0.106 0.9975
  ..- attr(*, names)= chr [1:5] w3oab3 w3oab4 w3oab7 w3oab8 ...

Anyway I have no idea what you consider wrong?

Regards
Petr



 -Original Message-
 From: Justin Delahunty [mailto:a...@genius.net.au]
 Sent: Friday, July 26, 2013 2:22 PM
 To: PIKAL Petr; 'Justin Delahunty'; r-help@r-project.org
 Subject: RE: [R] Maintaining data order in factanal with missing data
 
 Hi Petr,
 
 Thanks for the quick response. Unfortunately I cannot share the data I 
 am working with, however please find attached a suitable R workspace 
 with generated data. It has the appropriate variable names, only the 
 data has been changed.
 
 The last function in the list (init.dfs()) I call to subset the 
 overall data set into the three waves, then conduct the factor 
 analysis on each
 (1 factor CFA); it's just in a function to ease re-typing in a new 
 workspace.
 
 
 Thanks,
 
 Justin
 
 -Original Message-
 From: PIKAL Petr [mailto:petr.pi...@precheza.cz]
 Sent: Friday, 26 July 2013 7:35 PM
 To: Justin Delahunty; r-help@r-project.org
 Subject: RE: [R] Maintaining data order in factanal with missing data
 
 Hi
 
 You provided functions, so far so good. But without data it would be 
 quite difficult to understand what the functions do and where could be 
 the issue.
 
 I suspect combination of complete cases selection together with subset 
 and factor behaviour. But I can be completely out of target too.
 
 Petr
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- 
  project.org] On 

Re: [R] Maintaining data order in factanal with missing data

2013-07-26 Thread PIKAL Petr
Hi

There are probably better options but

merge(data.frame(x=1:154),data.frame(x=names(ab.1.fa[[1]]), y=ab.1.fa[[1]]), 
all.x=T)

gives you data frame with NA when there was missing value in the first 
data.frame.

You probably can automate the process a bit with nrow function.

Regards
Petr



 -Original Message-
 From: Justin Delahunty [mailto:a...@genius.net.au]
 Sent: Friday, July 26, 2013 3:34 PM
 To: PIKAL Petr; 'Justin Delahunty'; 'Justin Delahunty'; r-help@r-
 project.org
 Subject: RE: [R] Maintaining data order in factanal with missing data
 
 Hi Petr,
 
 So sorry, I accidentally attached the complete data set rather than the
 one with missing values. I've attached the correct file to this email.
 RE:
 init.dfs() being local, I hadn't even thought of that. I've been away
 from OOP for close to 15 years now, so it might be time to revise!
 
 The problem I have is that with missing values the list of factor
 scores returned (ab.w1.fa$factor.scores) does not map onto the
 originating data frame (ab.w1.df) as it no longer includes the cases
 which had missing values. So while the original data set for ab.w1.df
 contains 154 ordered cases, the factor analysis contains only 150.
 
 I am seeking a way to map the values derived from the factor analysis
 (ab.w1.fa$factor.scores) back to their original ordered position, so
 that these factor score variables may be merged back into the master
 data frame (ab.df). A unique ID for each case is available ($dmid)
 which I had thought to use when merging the new variables, however I
 don't know how to implement this.
 
 
 Thanks for your help,
 
 Justin
 
 
 -Original Message-
 From: PIKAL Petr [mailto:petr.pi...@precheza.cz]
 Sent: Friday, 26 July 2013 10:59 PM
 To: Justin Delahunty; Justin Delahunty; r-help@r-project.org
 Subject: RE: [R] Maintaining data order in factanal with missing data
 
 Hi
 
 Well, the function init.dfs does nothing as all data frames created
 inside it does not propagate to global environment and there is nothing
 what the function returns.
 
 Tha last line (when used outside a function) gives warnings but there
 is no sign of error.
 
 When
 
  head(ab.1.df)
   dmid   g5oab2  g53  g54  g55   g5ovb1
 11 1.418932 1.805227 2.791152 3.624116 3.425586
 22 2.293907 1.187830 1.611237 1.748526 3.816533
 33 2.836536 2.679523 1.279639 2.674986 2.452395
 44 1.872259 3.278359 1.785872 2.458315 1.146480
 55 1.467195 1.180747 3.564127 3.007682 2.109506
 66 3.098512 3.151974 3.969379 3.750571 1.497358
  head(ab.2.df)
   dmid   w2oab3  w22  w23  w24   w2ovb1
 11 4.831362 5.522764 7.809366 6.969172 7.398385
 22 6.706346 4.101742 1.434697 5.266775 5.357641
 33 3.653806 2.666885 1.209326 5.125556 4.963374
 44 7.221255 7.649152 6.540398 6.648506 2.576081
 55 1.848023 5.044314 2.761881 3.307220 1.454234
 66 7.606429 4.911766 2.034813 2.638573 2.818834
  head(ab.3.df)
   dmid   w3oab3   w3oab4   w3oab7   w3oab8   w3ovb1
 11 5.835609 6.108220 6.587721 2.451461 2.785467
 22 4.973198 1.196815 6.388056 1.110877 4.226463
 33 3.800367 6.697287 5.235345 6.666829 6.319073
 44 1.093141 1.43 2.269252 3.194978 4.916342
 55 1.975060 7.204516 4.825435 1.775874 3.484027
 66 3.273361 2.243805 5.326547 5.720892 6.118723
 
 
  str(ab.1.fa)
 List of 2
  $ rescaled.scores: Named num [1:154] 3.43 3.83 2.43 1.1 2.08 ...
   ..- attr(*, names)= chr [1:154] 1 2 3 4 ...
  $ factor.loadings: Named num [1:5] -0.0106 -0.0227 -0.1093 -0.0912
 0.9975
   ..- attr(*, names)= chr [1:5] g5oab2 g53 g54 g55 ...
  str(ab.2.fa)
 List of 2
  $ rescaled.scores: Named num [1:154] 6.34 5.24 5.3 1.91 2.16 ...
   ..- attr(*, names)= chr [1:154] 1 2 3 4 ...
  $ factor.loadings: Named num [1:5] -0.2042 0.0063 -0.2287 -0.0119
 0.7138
   ..- attr(*, names)= chr [1:5] w2oab3 w22 w23 w24 ...
  str(ab.3.fa)
 List of 2
  $ rescaled.scores: Named num [1:154] NaN NaN NaN NaN NaN NaN NaN NaN
 NaN NaN ...
   ..- attr(*, names)= chr [1:154] 1 2 3 4 ...
  $ factor.loadings: Named num [1:5] -0.1172 0.0128 -0.0968 0.106 0.9975
   ..- attr(*, names)= chr [1:5] w3oab3 w3oab4 w3oab7 w3oab8
 ...
 
 Anyway I have no idea what you consider wrong?
 
 Regards
 Petr
 
 
 
  -Original Message-
  From: Justin Delahunty [mailto:a...@genius.net.au]
  Sent: Friday, July 26, 2013 2:22 PM
  To: PIKAL Petr; 'Justin Delahunty'; r-help@r-project.org
  Subject: RE: [R] Maintaining data order in factanal with missing data
 
  Hi Petr,
 
  Thanks for the quick response. Unfortunately I cannot share the data
 I
  am working with, however please find attached a suitable R workspace
  with generated data. It has the appropriate variable names, only the
  data has been changed.
 
  The last function in the list (init.dfs()) I call to subset the
  overall data set into the three waves, then conduct the factor
  analysis on each
  (1 factor CFA); it's just in a function to ease re-typing in a new
  workspace.
 
 
  

Re: [R] prediction survival curves for coxph-models; how to extract the right strata per individual

2013-07-26 Thread Terry Therneau
It would help me to give advice if I knew what you wanted to do with the new curves.  
Plot, print, extract?


A more direct solution to your question will appear in the next release of the 
code, btw.

Terry T.


On 07/25/2013 05:00 AM, r-help-requ...@r-project.org wrote:

My problem is:



I have a coxph.model with several strata and other covariables.

Now I want to fit the estimated survival-curves for new data, using
survfit.coxph.

But this returns a prediction for each stratum per individual. So if I
have 15 new individuals and 10 strata, I have 150 fitted surivival curves
(or if I don't use the subscripts I have 15 predictions with the curves
for all strata pasted together)



Is there any possibility to get only the survival curves for the stratum
the new individual belongs to?




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] X matrix deemed to be singular and cbind

2013-07-26 Thread Soumitro Dey
Hi list,

While the X matrix deemed to be singular question has been answered in
the list for quite a few times, I have a twist to it.

I am using the coxph model for survival analysis on a dataset containing
over 160,000 instances and 46 independent variables and I have 2 scenarios:

1. If I use cbind on the 46 independent variables (many of which are
categorical), coxph runs without any frills. The problem however is that it
won't report which of the categorical variables (e.g. VERY HIGH, HIGH,
NEUTRAL, LOW or VERY LOW) are actually meaningful/significant(e.g. XHIGH
***, XLOW ., etc). Is there any way to check this?

2. If I don't use cbind, assuming it'll give me the details I am looking
for in the previous step, it throws me the X matrix deemed to be
singular, more precisely: X matrix deemed to be singular; variable 130
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168
169 170 171 172 173 174 175 176 177 178 179 180 181

Could anyone please elaborate on how to get around problem #1 or #2?

Thanks!
SD

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Saving multiple rda-files as one rda-file

2013-07-26 Thread David Winsemius

On Jul 25, 2013, at 7:17 AM, Dark wrote:

 Hi, 
 
 Yes maybe I should have been more clear on my problem.
 I want to append the different data-frames back into one variable ( rbind )
 and save it as one R Data file.
 

Indeed. That was the operation I had in mind when I made my suggestions. 
Perhaps you need to create a set of toy dataframes with similar structure and 
then the audience can propose solutions. That's the usual process around these 
parts.

-- 
David.


 Regards Derk
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Saving-multiple-rda-files-as-one-rda-file-tp4672041p4672313.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Differential problem

2013-07-26 Thread Raphaëlle Carraud
Hello,

I am trying to solve the following algebraic differential equation :

dy1 = h - dy3
dy2 = g - dy4
y3 = K1*y1/(y3+y4)
y4 = K2*y2/(y3+y4)

I tried using the function daspk, but it gives me the following error :

 out - daspk(y = yini, dy = dyini, times = z, res = liquide, parms = 0)
daspk--  warning.. At T(=R1) and stepsize H (=R2) the
  nonlinear solver failed to converge
  repeatedly of with abs (H) = HMIN g, 0
Warning messages:
1: In daspk(y = yini, dy = dyini, times = z, res = liquide, parms = 0) :
  repeated convergence test failures on a step - inaccurate Jacobian or 
preconditioner?
2: In daspk(y = yini, dy = dyini, times = z, res = liquide, parms = 0) :
  Returning early. Results are accurate, as far as they go
 head(out)
 time   12 3 4  5  6 7 8 9
[1,]0 0.9 1.33 0 0 10 20 1e-04 0.001 0.001
[2,]0 0.9 1.33 0 0 10 20 1e-04 0.001 0.001

I read the documentation but it only says that daspk can only solve DAE of 
index one maximum, which I do not understand how to determine. I was wondering 
if I had to use the function radau but I do not get how to do the M matrix? I 
did not managed to get it with the example.
Could it be an error in my program? Here it is :

liquide - function(z, C, dC, parameters) {
  with(as.list(c(C,dC,parameters)),{

Ct = C[1] + C[2] + C[3] + C[4] + C[5] + C[6] + C[7] + C[8] + C[9] + Ceau
V - 1

K32 - 6.54*10^2  # m^3/mol
K33 - 1.37*10^-2 # m^3/mol
K34 - 0.330  # sans unité
K35 - 5.81*10^1  # sans unité

kf2 - 1.37   # m^3/mol
kf3 - 1.37   # m^3/mol
kf4 - 8.68*10^-1 # m^3
kf5 - 157.2*10^-6# m^3

K2 - 10^1.44*10^3# mol/m^3
K3 - 10^(-3.224)*10^3# mol/m^3
Ke - 10^-11  # mol/m^3

r1 - kf4*C[4]/V - kf4/K34*C[5]^2/(Ct*V)
r2 - kf3*C[1]*C[2] - kf3/K33*C[4]
r3 - kf2*C[1]^2 - kf2/K32*C[3]
r4 - kf5*C[3]/V - kf5/K35*C[5]*C[6]*C[8]/(Ct^2*V)

res1 - dC[1] + r2 + r3 # dNO2/dt
res2 - dC[2] + r2  # dNO/dt
res3 - dC[3] - r3 + r4 # dN2O4/dt
res4 - dC[4] - r2 + r1 # dN2O3/dt
res5 - dC[5] -2*r1 - r4 + dC[7]# dHNO2/dt
res6 - dC[6] - r4 + dC[8]  # dHNO3/dt
res7 - C[7] - K3*C[5]/C[9] # dNO2-/dt
res8 - C[8] - K2*C[6]/C[9] # dNO3-/dt
res9 - dC[9] - dC[8] - dC[7]   # dCH/dt

list(c(res1, res2, res3, res4, res5, res6, res7, res8, res9))
  })
}

yini - c(0.9, 1.33, 0, 0, 10,  20, 0.001, 22.9, 23)
dyini - c(1,1,1,1,1,1,1,1,1)

Qm - 4   # kg/h
Ceau - (Qm - yini[1]*0.046 - yini[2]*0.03 + yini[3]*0.092 + yini[4]*0.076 +
   yini[5]*0.062 + yini[6]*0.063)/0.018# mol/m^3

parameters - c(Qm = Qm,
Ceau = Ceau)

liquide(20,yini, dyini,parameters)

z - seq(0, 120, by = 1)  # en seconde

library(deSolve)
out - daspk(y = yini, dy = dyini, times = z, res = liquide, parms = 0)
head(out)
plot(out)

Thanks

Raphaëlle Carraud

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Holt-Winters problem

2013-07-26 Thread Przemek Gawin
This post has NOT been accepted by the mailing list yet.
This post was updated on Jul 26, 2013; 4:40pm.
Hello,

Two days ago I started my journey with R, and right now I'm struck on
Holtwinters, and have no idea, what I'm doing wrong. So now, I'll show
step by step what I've been doing. Actually it is a part of Introducy
Time Series with R.

1) I download data from
http://www.massey.ac.nz/~pscowper/ts/motororg.dat due to port
blockage. Save it as motororg.txt

2) Inserting this code:
Motor.dat - read.table(motororg.txt, header = T)
attach(Motor.dat)
Comp.hw1 - HoltWinters(complaints, beta = 0, gamma = 0)

3) Baang! Error here.
Error in decompose(ts(x[1L:wind], start = start(x), frequency = f),
seasonal) :
 time series has no or less than 2 periods

4) Actually it has, and I turned it on with attach command?

Motor.dat
   complaints
1  27
2  34
3  31
4  24
5  18
6  19
7  17
8  12
9  26
10 14
11 18
12 33
13 31
14 31
15 19
16 17
17 10
18 25
19 26
20 18
21 18
22 10
23  4
24 20
25 28
26 21
27 18
28 23
29 19
30 16
31 10
32 24
33 11
34 11
35 13
36 27
37 18
38 20
39 21
40  6
41  9
42 29
43 12
44 19
45 14
46 19
47 16
48 23

Can you tell me what I am doing wrong?

PS

I tried to make complaints.ts as well but I didn't work either.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cannot install XML package - getting cannot open URL 'http://cran.cnr.Berkeley.edu/bin/windows/contrib/3.0/XML_3.98-1.1.zip' error

2013-07-26 Thread Grach, Alexander
Hello All, 

 

I am not able to install/update XML package, getting the following:

 

 install.packages(XML, lib = .libPaths()[3])

trying URL
'http://cran.cnr.Berkeley.edu/bin/windows/contrib/3.0/XML_3.98-1.1.zip'

Error in download.file(url, destfile, method, mode = wb, ...) : 

  cannot open URL
'http://cran.cnr.Berkeley.edu/bin/windows/contrib/3.0/XML_3.98-1.1.zip'

In addition: Warning message:

In download.file(url, destfile, method, mode = wb, ...) :

  cannot open: HTTP status was '503 Service Unavailable'

Warning in download.packages(pkgs, destdir = tmpd, available =
available,  :

  download of package 'XML' failed

 

Also have tried to download  the XML_3.98-1.1.zip from
http://cran.r-project.org/web/packages/XML/index.html - getting an
error:

Unable to download XML_3.98-1.1.zip from cran.r-project.org

Unable to open this internet site. The requested site is either
unavailable or cannot be found. Please try again later.

 

Using R 3.0.1 on windows and able to install/update other packages fine.

 

Please help,

 

Many thanks,

Alex

 

 


=
 
This material has been prepared by individual sales and/or trading personnel 
and does not 
constitute investment research.  Please follow the attached hyperlink to an 
important disclaimer: 
http://www.credit-suisse.com/americas/legal/salestrading 
=
 


=== 
Please access the attached hyperlink for an important el...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated measures Cox regression ??coxph??

2013-07-26 Thread Marc Schwartz
Thanks Terry. 

Good points. I recalled last night some exchanges on r-sig-mixed-models 
regarding a reasonable number of 'replications' for the estimation of random 
effects and it occurred to me that with this study, you will have 0, 1 or 2 
events per subject, depending upon the subject risk profiles for hip 
replacement and length of follow up. 

It was not clear to me if John's cohort study is retrospective or prospective. 
If the former, then he will have some insights into the event distribution. If 
the latter and he needs to pre-specify the analytic method, a GEE style 
approach using coxph() may make more sense here given the unknowns.

Regards,

Marc
 
On Jul 26, 2013, at 7:02 AM, Terry Therneau thern...@mayo.edu wrote:

 Two choices.
 If this were a linear model, do you like the GEE approach or a mixed effects 
 approach?  Assume that subject is a variable containing a per-subject 
 identifier.
 
 GEE approach: add + cluster(subject) to the model statement in coxph
 Mixed models approach: Add  + (1|subject) to the model statment in coxme.
 
 When only a very few subjects have multiple events, the mixed model (random 
 effect) approach may not be reliable, however.  Multiple events per group are 
 the fuel for estimation of the variance of the random effect, and with few of 
 these the profile likelihood of the random effect will be very flat.  You can 
 get esssentially a random estimate of the variance of the subject effect.  
 I'm still getting my arms around this issue, and it has taken me a long time.
 
 Frailty is an alternate label for random effects when all we have is a 
 random intercept.  Multiple labels for the same idea adds confusion, but 
 nothing else.
 
 Terry Therneau
 
 On 07/25/2013 08:14 PM, Marc Schwartz wrote:
 On Jul 25, 2013, at 4:45 PM, David Winsemiusdwinsem...@comcast.net  wrote:
 
 On Jul 25, 2013, at 12:27 PM, Marc Schwartz wrote:
 
 On Jul 25, 2013, at 2:11 PM, John Sorkinjsor...@grecc.umaryland.edu  
 wrote:
 
 Colleagues,
 Is there any R package that will allow one to perform a repeated measures 
 Cox Proportional Hazards regression? I don't think coxph is set up to 
 handle this type of problem, but I would be happy to know that I am not 
 correct.
 I am doing a study of time to hip joint replacement. As each person has 
 two hips, a given person can appear in the dataset twice, once for the 
 left hip and once for the right hip, and I need to account for the 
 correlation of data from a single individual.
 Thank you,
 John
 
 
 John,
 
 See Terry's 'coxme' package:
 
 http://cran.r-project.org/web/packages/coxme/index.html
 
 When I looked over the description of coxme, I was concerned it was not 
 really designed with this in mind. Looking at Therneau and Grambsch, I 
 thought section 8.4.2 in the 'Multiple Events per Subject' Chapter fit the 
 analysis question well. There they compared the use of coxph( 
 ...+cluster(ID),,...)  withcoxph( ...+strata(ID),,...). Unfortunately I 
 could not tell for sure which one was being described as superio but I 
 think it was the cluster() alternative. I seem to remember there are 
 discussions in the archives.
 
 David,
 
 I think that you raise a good point. The example in the book (I had to wait 
 to get home to read it) is potentially different however, in that the 
 subject's eye's were randomized to treatment or control, which would seem to 
 suggest comparable baseline characteristics for each pair of eyes, as well 
 as an active intervention on one side where a difference in treatment effect 
 between each eye is being analyzed.
 
 It is not clear from John's description above if there is one hip that will 
 be treated versus one as a control and whether the extent of disease at 
 baseline is similar in each pair of hips. Presumably the timing of hip 
 replacements will be staggered at some level, even if there is comparable 
 disease, simply due to post-op recovery time and surgical risk. In cases 
 where the disease between each hip is materially different, that would be 
 another factor to consider, however I would defer to orthopaedic 
 physicians/surgeons from a subject matter expertise consideration. It is 
 possible that the bilateral hip replacement data might be more of a parallel 
 to bilateral breast cancer data, if each breast were to be tracked 
 separately.
 
 I have cc'd Terry here, hoping that he might jump in and offer some insights 
 into the pros/cons of using coxme versus coxph with either a cluster or 
 strata based approach, or perhaps even a frailty based approach as in 9.4.1 
 in the book.
 
 Regards,
 
 Marc
 
 
 -- 
 David.
 You also might find the following of interest:
 
 http://bjo.bmj.com/content/71/9/645.full.pdf
 
 http://www.ncbi.nlm.nih.gov/pubmed/6885
 
 http://www.ncbi.nlm.nih.gov/pubmed/22078901
 
 
 
 Regards,
 
 Marc Schwartz
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read 

Re: [R] Pairwise comparison between columns, logic

2013-07-26 Thread arun
Hi Manisha,
I didn't run your dataset as I am on the way to college.  But, from the error 
reported, I think it will be due to some missing combinations in one of the 
dataset.  For ex. if you run my previous code without removing CEBPA:
ie.
mat1- combn(gset[,1],2)


lst2-lapply(split(mat1,col(mat1)),function(x) 
{x1-join_all(lst1[x],by=patient_id,type=inner);x1[patient_id] } )
#Error: All inputs to rbind.fill must be data.frames


So, check whether all the combinations are available in the `lst1`.

2. I will get back to you once I run it.
A.K.






From: Manisha manishab...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Friday, July 26, 2013 11:09 AM
Subject: Re: Pairwise comparison between columns, logic



Hi Arun,
I ran the script on a larger dataset and I seem to be running into this 
following error:
Error: All inputs to rbind.fill must be data.frames
after the step;
lst2-lapply(split(mat1,col(mat1)),function(x) 
{x1-join_all(lst1[x],by=firehose_patient_id,type=inner);x1[firehose_patient_id]})
 
I tried a few things to solve the issue but I am not able to. The format of 
input files and data are same as in the code you posted.
Could you suggest me something?
I have attached my input files on which I am trying to run the code. See 
attached code as well. Minor changes have been made by me.

2. I have another question. From your code how do also capture those pairs of 
names that donot have any common patient id?

Thanks again,
-M


On Fri, Jul 26, 2013 at 9:29 AM, arun smartpink...@yahoo.com wrote:

Hi M,
No problem.
Regards,
Arun




- Original Message -
From: manishab...@gmail.com manishab...@gmail.com
To: smartpink...@yahoo.com
Cc:
Sent: Friday, July 26, 2013 9:27 AM
Subject: Re: Pairwise comparison between columns, logic

Thanks for the code. It is elegant and does what I need. Learnt some new 
things.
-M


_
Sent from http://r.789695.n4.nabble.com
  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] X matrix deemed to be singular and cbind

2013-07-26 Thread Bert Gunter
Soumitro:

Have you read An Introduction to R. If not, do so, as some of your
confusion appears related to basic concepts (e.g. of factors)
explained there.

1. Presumably your categorical variables are factors, not character.
If so, when you cbind() them, you cbind their integer codes, yielding
numerical variables. This produces an in incorrect design matrix in
fitting -- 1 df per categorical variable instead of 1 less than the
number of levels. Also see ?cbind.

2. Produces the correct design matrix, but you are overfitting,
presumably because of many different levels for your categorical
variables. I suggest you consult with a local statistician to decide
how best to handle this, as you seem to be out of your depth with
regard to model fitting.

... unless I have misunderstood, of course.

Cheers,
Bert

On Fri, Jul 26, 2013 at 7:55 AM, Soumitro Dey soumitrod...@gmail.com wrote:
 Hi list,

 While the X matrix deemed to be singular question has been answered in
 the list for quite a few times, I have a twist to it.

 I am using the coxph model for survival analysis on a dataset containing
 over 160,000 instances and 46 independent variables and I have 2 scenarios:

 1. If I use cbind on the 46 independent variables (many of which are
 categorical), coxph runs without any frills. The problem however is that it
 won't report which of the categorical variables (e.g. VERY HIGH, HIGH,
 NEUTRAL, LOW or VERY LOW) are actually meaningful/significant(e.g. XHIGH
 ***, XLOW ., etc). Is there any way to check this?

 2. If I don't use cbind, assuming it'll give me the details I am looking
 for in the previous step, it throws me the X matrix deemed to be
 singular, more precisely: X matrix deemed to be singular; variable 130
 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168
 169 170 171 172 173 174 175 176 177 178 179 180 181

 Could anyone please elaborate on how to get around problem #1 or #2?

 Thanks!
 SD

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated measures Cox regression ??coxph??

2013-07-26 Thread John Sorkin
Marc,
Thank you for your comments. The data has been previously collected, so the 
study is a non-concurrent prospective analysis, i.e. retrospective analysis.
John


John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 
 Marc Schwartz marc_schwa...@me.com 07/26/13 11:13 AM 
Thanks Terry. 

Good points. I recalled last night some exchanges on r-sig-mixed-models 
regarding a reasonable number of 'replications' for the estimation of random 
effects and it occurred to me that with this study, you will have 0, 1 or 2 
events per subject, depending upon the subject risk profiles for hip 
replacement and length of follow up. 

It was not clear to me if John's cohort study is retrospective or prospective. 
If the former, then he will have some insights into the event distribution. If 
the latter and he needs to pre-specify the analytic method, a GEE style 
approach using coxph() may make more sense here given the unknowns.

Regards,

Marc
 
On Jul 26, 2013, at 7:02 AM, Terry Therneau thern...@mayo.edu wrote:

 Two choices.
 If this were a linear model, do you like the GEE approach or a mixed effects 
 approach?  Assume that subject is a variable containing a per-subject 
 identifier.
 
 GEE approach: add + cluster(subject) to the model statement in coxph
 Mixed models approach: Add  + (1|subject) to the model statment in coxme.
 
 When only a very few subjects have multiple events, the mixed model (random 
 effect) approach may not be reliable, however.  Multiple events per group are 
 the fuel for estimation of the variance of the random effect, and with few of 
 these the profile likelihood of the random effect will be very flat.  You can 
 get esssentially a random estimate of the variance of the subject effect.  
 I'm still getting my arms around this issue, and it has taken me a long time.
 
 Frailty is an alternate label for random effects when all we have is a 
 random intercept.  Multiple labels for the same idea adds confusion, but 
 nothing else.
 
 Terry Therneau
 
 On 07/25/2013 08:14 PM, Marc Schwartz wrote:
 On Jul 25, 2013, at 4:45 PM, David Winsemiusdwinsem...@comcast.net  wrote:
 
 On Jul 25, 2013, at 12:27 PM, Marc Schwartz wrote:
 
 On Jul 25, 2013, at 2:11 PM, John Sorkinjsor...@grecc.umaryland.edu  
 wrote:
 
 Colleagues,
 Is there any R package that will allow one to perform a repeated measures 
 Cox Proportional Hazards regression? I don't think coxph is set up to 
 handle this type of problem, but I would be happy to know that I am not 
 correct.
 I am doing a study of time to hip joint replacement. As each person has 
 two hips, a given person can appear in the dataset twice, once for the 
 left hip and once for the right hip, and I need to account for the 
 correlation of data from a single individual.
 Thank you,
 John
 
 
 John,
 
 See Terry's 'coxme' package:
 
 http://cran.r-project.org/web/packages/coxme/index.html
 
 When I looked over the description of coxme, I was concerned it was not 
 really designed with this in mind. Looking at Therneau and Grambsch, I 
 thought section 8.4.2 in the 'Multiple Events per Subject' Chapter fit the 
 analysis question well. There they compared the use of coxph( 
 ...+cluster(ID),,...)  withcoxph( ...+strata(ID),,...). Unfortunately I 
 could not tell for sure which one was being described as superio but I 
 think it was the cluster() alternative. I seem to remember there are 
 discussions in the archives.
 
 David,
 
 I think that you raise a good point. The example in the book (I had to wait 
 to get home to read it) is potentially different however, in that the 
 subject's eye's were randomized to treatment or control, which would seem to 
 suggest comparable baseline characteristics for each pair of eyes, as well 
 as an active intervention on one side where a difference in treatment effect 
 between each eye is being analyzed.
 
 It is not clear from John's description above if there is one hip that will 
 be treated versus one as a control and whether the extent of disease at 
 baseline is similar in each pair of hips. Presumably the timing of hip 
 replacements will be staggered at some level, even if there is comparable 
 disease, simply due to post-op recovery time and surgical risk. In cases 
 where the disease between each hip is materially different, that would be 
 another factor to consider, however I would defer to orthopaedic 
 physicians/surgeons from a subject matter expertise consideration. It is 
 possible that the bilateral hip replacement data might be more of a parallel 
 to bilateral breast cancer data, if each breast were to be tracked 
 separately.
 
 I have cc'd Terry here, hoping that he might jump in and offer some insights 
 into the 

[R] Hmisc ctable rotate option obsolete?

2013-07-26 Thread Simon Zehnder
Dear R-Users and R-Devels,

I may have found a deprecated option for the 'latex' function in the Hmisc 
package. I am working with Hmisc and knitr and tried the following code:

 \documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{amsmath, amssymb}
\usepackage{ctable}
%\usepackage{booktabs}
\begin{document}
results = 'asis'=
library(Hmisc)
iris.t - head(iris)
iris.t[seq(2, NROW(iris.t), by = 2),] - format(iris.t[seq(2, NROW(iris.t), by 
= 2),], scientific = TRUE)
texMat - matrix(, ncol = 5, nrow = 6)
texMat[seq(2,nrow(texMat), by = 2), ] - scriptsize
latex(iris.t, 
file = '', 
landscape = TRUE,
dcolumn = TRUE,
col.just = c('r','c', 'r','c', 'l'),
cdec = c(0, 0, 1, 1, 0),
na.blank = TRUE,
rowname = '',
rowlabel = '', 
cellTexCmd = texMat,
ctable = TRUE, 
cgroup = c('Observations', ''),
n.cgroup = c(4, 1),
rgroup = c('',''),
n.rgroup = c(3, 3),
caption = 'iris'
)
@
\end{document}

Everything runs fine but the 'landscape' option. It says in the help for 
'latex' that if option 'ctable' is set to TRUE the 'rotate' option for ctable 
is used if 'nadscape' is set TRUE. Looking at the ctable documentary 
(http://texdoc.net/texmf-dist/doc/latex/ctable/ctable.pdf) in section Change 
History, I get for version v1.07: General: Added option sideways, option rotate 
now obsolete. Hasn't this been updated in the Hmisc package?

Best

Simon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Holt-Winters problem

2013-07-26 Thread Berend Hasselman

On 26-07-2013, at 16:18, Przemek Gawin arl...@gmail.com wrote:

 This post has NOT been accepted by the mailing list yet.
 This post was updated on Jul 26, 2013; 4:40pm.
 Hello,
 
 Two days ago I started my journey with R, and right now I'm struck on
 Holtwinters, and have no idea, what I'm doing wrong. So now, I'll show
 step by step what I've been doing. Actually it is a part of Introducy
 Time Series with R.
 
 1) I download data from
 http://www.massey.ac.nz/~pscowper/ts/motororg.dat due to port
 blockage. Save it as motororg.txt
 
 2) Inserting this code:
 Motor.dat - read.table(motororg.txt, header = T)
 attach(Motor.dat)
 Comp.hw1 - HoltWinters(complaints, beta = 0, gamma = 0)
 
 3) Baang! Error here.
 Error in decompose(ts(x[1L:wind], start = start(x), frequency = f),
 seasonal) :
 time series has no or less than 2 periods
 

Your complaints vector is not a timeseries. 

From the section Arguments of HoltWinters:

gamma   parameter used for the seasonal component. If set to FALSE, an 
non-seasonal model is fitted.

So use gamma=FALSE.

Don't use attach().
Use Motor.dat$complaints in the HoltWinter call 

or

Comp.hw1 - with(Motor.dat, HoltWinters(complaints,  beta = 0, gamma = FALSE))


Berend

 4) Actually it has, and I turned it on with attach command?
 
 Motor.dat
   complaints
 1  27
 2  34
 3  31
 4  24
 5  18
 6  19
 7  17
 8  12
 9  26
 10 14
 11 18
 12 33
 13 31
 14 31
 15 19
 16 17
 17 10
 18 25
 19 26
 20 18
 21 18
 22 10
 23  4
 24 20
 25 28
 26 21
 27 18
 28 23
 29 19
 30 16
 31 10
 32 24
 33 11
 34 11
 35 13
 36 27
 37 18
 38 20
 39 21
 40  6
 41  9
 42 29
 43 12
 44 19
 45 14
 46 19
 47 16
 48 23
 
 Can you tell me what I am doing wrong?
 
 PS
 
 I tried to make complaints.ts as well but I didn't work either.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to double integrate a function in R

2013-07-26 Thread Tiago V. Pereira
Hello, R users!

I am trying to double integrate the following expression:

#  expression
(1/(2*pi))*exp(-y2/2)*sqrt((y1/(y2-y1)))

for y2y10.

I am trying the following approach

# first attempt

 library(cubature)
fun - function(x)   { (1/(2*pi))*exp(-x[2]/2)*sqrt((x[1]/(x[2]-x[1])))}
adaptIntegrate(fun, lower = c(0,0), upper =c(5, 6), tol=1e-8)

However, I don't know how to constrain the integration so that y2y10.

Any ideas?

Tiago

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] transform dataframe with look-up table

2013-07-26 Thread Juan Antonio Balbuena

   Hello
   First thank you very much to Jean, Bill, Brian and David for the answers and
   code. I very extremely grateful.
   I am eventually going to adapt Brian's code with a very minor alteration. If
   one follows the original syntax
End - merge(merge(Start, transformer, by.x=Left, by.y=input, 
all.x=TRUE),
  transformer, by.x=Right, by.y=input, all.x=TRUE)

   the Left variables are listed right and vice versa, which seems odd. Just
   swapping Left and Right will put it right (pun intentional):
End - merge(merge(Start, transformer, by.x=Right, by.y=input, 
all.x=TRUE),
  transformer, by.x=Left, by.y=input, all.x=TRUE)

   Best wishes
   Juan Antonio

   El 25/07/2013 18:02, Brian Diggs escribió:

On 7/25/2013 8:13 AM, Juan Antonio Balbuena wrote:

Hello
I hope that there is a simple solution to this apparently complex problem.
Any help will be much appreciated:
I have a dataframe with Left and Right readings (that is, elements in each
row are paired). For instance,
Left Right
 [1]  98
 [2]  43
 [3]  21
 [4]  65
 [5]  31
 [6]  41
 [7]  32
 [8]  42
 [9]  10   8
[10]  9   10
I  need  to  produce a new data frame where the values are transformed
according to a look-up table such as
inputoutput
 [1] 5  1
 [2]10 1
 [3] 4  2
 [4] 8  3
 [5] 6  5
 [6] 5  6
 [7] 7  6
 [8] 2  7
 [9] 9  7
[10]107
[11] 2 8
So  [1, ] in the new dataframe would be 7 3. Quite simple so far, but what
makes things complicated is the multiple outputs for a single input. In thi
s
example, 10 corresponds to 1 and 7 so [9, ] in the input dataframe must
yield two rows in its output counterpart: 1 3 and 7 3. Likewise the output
for  [10, ] should be 7 1 and 7 7. In addition, given that 3 and 1 are
missing as inputs the output for [5, ] should be NA NA.
Thank you very much for your time.
Juan Antonio Balbuena

merge can handle both of these requirements.

First, making the two datasets reproducible:

Start - data.frame(Left=c(9,4,2,6,3,4,3,4,10,9),
 Right=c(8,3,1,5,1,1,2,2,8,10))

transformer - data.frame(input=c(5,10,4,8,6,5,7,2,9,10,2),
   output=c(1,1,2,3,5,6,6,7,7,7,8))

Then add a marker of the original row numbers so that the work can be 
checked more easily later (not really needed for the calculations):

Start$rownum - seq_len(nrow(Start))

Two merge statements with the columns specified and all.x set to TRUE 
(to keep cases even without a match):

End - merge(merge(Start, transformer, by.x=Left, by.y=input, 
all.x=TRUE),
  transformer, by.x=Right, by.y=input, all.x=TRUE)

Then we can look at the output, resorted by the original row numbers:

End[order(End$rownum),]

which gives

Right Left rownum output.x output.y
12 89  173
9  34  22   NA
1  12  37   NA
2  12  38   NA
10 56  456
11 56  451
3  13  5   NA   NA
4  14  62   NA
5  23  7   NA7
6  23  7   NA8
7  24  827
8  24  828
13 8   10  913
14 8   10  973
15109 1071
16109 1077


--

Dr. Juan A. Balbuena
Marine Zoology Unit
Cavanilles Institute of Biodiversity and Evolutionary Biology
University of
Valencia
[1][1]http://www.uv.es/~balbuena
P.O. Box 22085
[2][2]http://www.uv.es/cavanilles/zoomarin/index.htm
46071 Valencia, Spain
[3][3]http://cetus.uv.es/mullpardb/index.html
e-mail: [[4]4]j.a.balbu...@uv.estel. +34 963 543 658fax +34 963 543
 733

NOTE! For shipments by EXPRESS COURIER use the following street address:
C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.


References

1. [5]http://www.uv.es/%7Ebalbuena
2. [6]http://www.uv.es/cavanilles/zoomarin/index.htm
3. [7]http://cetus.uv.es/mullpardb/index.html
4. [8]mailto:j.a.balbu...@uv.es


   --

   Dr. Juan A. Balbuena
   Marine Zoology Unit
   Cavanilles Institute of Biodiversity and Evolutionary Biology
   University of
   Valencia
   [9]http://www.uv.es/~balbuena
   P.O. Box 22085
   [10]http://www.uv.es/cavanilles/zoomarin/index.htm
   46071 Valencia, Spain
   [11]http://cetus.uv.es/mullpardb/index.html
   e-mail: [12]j.a.balbu...@uv.estel. +34 963 543 658fax +34 963 543
   733
   

Re: [R] cannot install XML package - getting cannot open URL 'http://cran.cnr.Berkeley.edu/bin/windows/contrib/3.0/XML_3.98-1.1.zip' error

2013-07-26 Thread Jeff Newmiller
I have no trouble accessing either of those files. Keep in mind that servers do 
not have 100% uptime (and the people responsible for uptime do not monitor this 
forum), and any local network problems you may be having cannot be addressed 
here. So, in general you need to try more servers, confirm the file you want 
does exist, verify that your local network is functioning, and practice a 
little patience.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Grach, Alexander alexander.gr...@credit-suisse.com wrote:
Hello All, 

 

I am not able to install/update XML package, getting the following:

 

 install.packages(XML, lib = .libPaths()[3])

trying URL
'http://cran.cnr.Berkeley.edu/bin/windows/contrib/3.0/XML_3.98-1.1.zip'

Error in download.file(url, destfile, method, mode = wb, ...) : 

  cannot open URL
'http://cran.cnr.Berkeley.edu/bin/windows/contrib/3.0/XML_3.98-1.1.zip'

In addition: Warning message:

In download.file(url, destfile, method, mode = wb, ...) :

  cannot open: HTTP status was '503 Service Unavailable'

Warning in download.packages(pkgs, destdir = tmpd, available =
available,  :

  download of package 'XML' failed

 

Also have tried to download  the XML_3.98-1.1.zip from
http://cran.r-project.org/web/packages/XML/index.html - getting an
error:

Unable to download XML_3.98-1.1.zip from cran.r-project.org

Unable to open this internet site. The requested site is either
unavailable or cannot be found. Please try again later.

 

Using R 3.0.1 on windows and able to install/update other packages
fine.

 

Please help,

 

Many thanks,

Alex

 

 


=

This material has been prepared by individual sales and/or trading
personnel and does not 
constitute investment research.  Please follow the attached hyperlink
to an important disclaimer: 
http://www.credit-suisse.com/americas/legal/salestrading 
=



===

Please access the attached hyperlink for an important
el...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] X matrix deemed to be singular and cbind

2013-07-26 Thread Soumitro Dey
Thanks for your response. You are right - I am new to R and it's
terminologies. I will follow up on your suggestions.




On Fri, Jul 26, 2013 at 11:22 AM, Bert Gunter gunter.ber...@gene.comwrote:

 Soumitro:

 Have you read An Introduction to R. If not, do so, as some of your
 confusion appears related to basic concepts (e.g. of factors)
 explained there.

 1. Presumably your categorical variables are factors, not character.
 If so, when you cbind() them, you cbind their integer codes, yielding
 numerical variables. This produces an in incorrect design matrix in
 fitting -- 1 df per categorical variable instead of 1 less than the
 number of levels. Also see ?cbind.

 2. Produces the correct design matrix, but you are overfitting,
 presumably because of many different levels for your categorical
 variables. I suggest you consult with a local statistician to decide
 how best to handle this, as you seem to be out of your depth with
 regard to model fitting.

 ... unless I have misunderstood, of course.

 Cheers,
 Bert

 On Fri, Jul 26, 2013 at 7:55 AM, Soumitro Dey soumitrod...@gmail.com
 wrote:
  Hi list,
 
  While the X matrix deemed to be singular question has been answered in
  the list for quite a few times, I have a twist to it.
 
  I am using the coxph model for survival analysis on a dataset containing
  over 160,000 instances and 46 independent variables and I have 2
 scenarios:
 
  1. If I use cbind on the 46 independent variables (many of which are
  categorical), coxph runs without any frills. The problem however is that
 it
  won't report which of the categorical variables (e.g. VERY HIGH, HIGH,
  NEUTRAL, LOW or VERY LOW) are actually meaningful/significant(e.g. XHIGH
  ***, XLOW ., etc). Is there any way to check this?
 
  2. If I don't use cbind, assuming it'll give me the details I am looking
  for in the previous step, it throws me the X matrix deemed to be
  singular, more precisely: X matrix deemed to be singular; variable 130
  131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
 149
  150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
 168
  169 170 171 172 173 174 175 176 177 178 179 180 181
 
  Could anyone please elaborate on how to get around problem #1 or #2?
 
  Thanks!
  SD
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:

 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Maintaining data order in factanal with missing data

2013-07-26 Thread David Carlson
When you use complete.cases(), it creates a logical vector
that selects cases with no missing values, but it does not
change the rownames in the data.frame and those are carried
through to the factor scores so you could link them up that
way. Alternatively, you could use na.exclude as the na.action
in the call to factanal() instead of complete.cases()? That
should pad the output with NAs for the cases that have missing
data.

You have to use the formula version of factanal():

 set.seed(42)
 example - data.frame(x=rnorm(15), y=rnorm(15), z=rnorm(15))
 to.na - cbind(sample.int(15, 3), sample.int(3, 3))
 example[to.na] - NA
 out - factanal(~x+y+z, 1, data=example, na.action=na.omit,
scores=regression)
 out$scores   # With na.omit the cases with missing values
are gone as indicated
   Factor1 # by the missing row numbers
2  -0.92604879
4   0.10731539
5  -0.24370504
6   0.07357697
7   0.69905895
8  -0.17646575
9   1.58430095
10 -0.35934769
12  1.07671299
13 -1.47487960
14 -0.30235156
15 -0.05816682
 out - factanal(~x+y+z, 1, data=example,
na.action=na.exclude, scores=regression)
 out$scores   # With na.exclude, the cases are kept out of
the analysis but the rows are
   Factor1 # preserved in the factor scores output
1   NA
2  -0.92604879
3   NA
4   0.10731539
5  -0.24370504
6   0.07357697
7   0.69905895
8  -0.17646575
9   1.58430095
10 -0.35934769
11  NA
12  1.07671299
13 -1.47487960
14 -0.30235156
15 -0.05816682

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352

Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of PIKAL Petr
Sent: Friday, July 26, 2013 9:06 AM
To: s00123...@myacu.edu.au; 'Justin Delahunty';
r-help@r-project.org
Subject: Re: [R] Maintaining data order in factanal with
missing data

Hi

There are probably better options but

merge(data.frame(x=1:154),data.frame(x=names(ab.1.fa[[1]]),
y=ab.1.fa[[1]]), all.x=T)

gives you data frame with NA when there was missing value in
the first data.frame.

You probably can automate the process a bit with nrow
function.

Regards
Petr



 -Original Message-
 From: Justin Delahunty [mailto:a...@genius.net.au]
 Sent: Friday, July 26, 2013 3:34 PM
 To: PIKAL Petr; 'Justin Delahunty'; 'Justin Delahunty';
r-help@r-
 project.org
 Subject: RE: [R] Maintaining data order in factanal with
missing data
 
 Hi Petr,
 
 So sorry, I accidentally attached the complete data set
rather than the
 one with missing values. I've attached the correct file to
this email.
 RE:
 init.dfs() being local, I hadn't even thought of that. I've
been away
 from OOP for close to 15 years now, so it might be time to
revise!
 
 The problem I have is that with missing values the list of
factor
 scores returned (ab.w1.fa$factor.scores) does not map onto
the
 originating data frame (ab.w1.df) as it no longer includes
the cases
 which had missing values. So while the original data set for
ab.w1.df
 contains 154 ordered cases, the factor analysis contains
only 150.
 
 I am seeking a way to map the values derived from the factor
analysis
 (ab.w1.fa$factor.scores) back to their original ordered
position, so
 that these factor score variables may be merged back into
the master
 data frame (ab.df). A unique ID for each case is available
($dmid)
 which I had thought to use when merging the new variables,
however I
 don't know how to implement this.
 
 
 Thanks for your help,
 
 Justin
 
 
 -Original Message-
 From: PIKAL Petr [mailto:petr.pi...@precheza.cz]
 Sent: Friday, 26 July 2013 10:59 PM
 To: Justin Delahunty; Justin Delahunty; r-help@r-project.org
 Subject: RE: [R] Maintaining data order in factanal with
missing data
 
 Hi
 
 Well, the function init.dfs does nothing as all data frames
created
 inside it does not propagate to global environment and there
is nothing
 what the function returns.
 
 Tha last line (when used outside a function) gives warnings
but there
 is no sign of error.
 
 When
 
  head(ab.1.df)
   dmid   g5oab2  g53  g54  g55   g5ovb1
 11 1.418932 1.805227 2.791152 3.624116 3.425586
 22 2.293907 1.187830 1.611237 1.748526 3.816533
 33 2.836536 2.679523 1.279639 2.674986 2.452395
 44 1.872259 3.278359 1.785872 2.458315 1.146480
 55 1.467195 1.180747 3.564127 3.007682 2.109506
 66 3.098512 3.151974 3.969379 3.750571 1.497358
  head(ab.2.df)
   dmid   w2oab3  w22  w23  w24   w2ovb1
 11 4.831362 5.522764 7.809366 6.969172 7.398385
 22 6.706346 4.101742 1.434697 5.266775 5.357641
 33 3.653806 2.666885 1.209326 5.125556 4.963374
 44 7.221255 7.649152 6.540398 6.648506 2.576081
 55 1.848023 5.044314 2.761881 3.307220 1.454234
 66 7.606429 4.911766 2.034813 2.638573 2.818834
  head(ab.3.df)
   dmid   w3oab3   w3oab4   w3oab7   w3oab8   w3ovb1
 11 5.835609 6.108220 6.587721 2.451461 2.785467
 22 4.973198 1.196815 

Re: [R] Saving multiple rda-files as one rda-file

2013-07-26 Thread jim holtman
What you need to ensure is that you have sufficient physical memory for the
operations that you want to do.  I would suggest at least 3X the size of
the object you want to create.  If you have RData files that will result
(after the rbind) into a 40GB object, then you will need over 100GB of
physical memory since the rbind operation will be creating a new object of
40GB from the separate files that total 40GB, so that is 80GB right there.
 If you then want to do some operations on an object that large, you might
be making copies, so you need the memory.

Maybe you should consider keeping the data in a relational database and
then use the SELECT operator to get just the data you need.  Also the
aggregation operators might be useful to reduce the size of physical memory
that you need.


On Fri, Jul 26, 2013 at 11:04 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Jul 25, 2013, at 7:17 AM, Dark wrote:

  Hi,
 
  Yes maybe I should have been more clear on my problem.
  I want to append the different data-frames back into one variable (
 rbind )
  and save it as one R Data file.
 

 Indeed. That was the operation I had in mind when I made my suggestions.
 Perhaps you need to create a set of toy dataframes with similar structure
 and then the audience can propose solutions. That's the usual process
 around these parts.

 --
 David.


  Regards Derk
 
 
 
  --
  View this message in context:
 http://r.789695.n4.nabble.com/Saving-multiple-rda-files-as-one-rda-file-tp4672041p4672313.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to double integrate a function in R

2013-07-26 Thread David Winsemius

On Jul 26, 2013, at 8:44 AM, Tiago V. Pereira wrote:

 I am trying to double integrate the following expression:
 
 #  expression
 (1/(2*pi))*exp(-y2/2)*sqrt((y1/(y2-y1)))
 
 for y2y10.
 
 I am trying the following approach
 
 # first attempt
 
 library(cubature)
fun - function(x)   { (1/(2*pi))*exp(-x[2]/2)*sqrt((x[1]/(x[2]-x[1])))}
adaptIntegrate(fun, lower = c(0,0), upper =c(5, 6), tol=1e-8)
 
 However, I don't know how to constrain the integration so that y2y10.

Generally incorporating boundaries is accomplished by multiplying the integrand 
with logical vectors that encapsulate what are effectively two conditions: 
Perhaps:

 fun - function(x)   { (x[1]x[2])*(x[1]0)* (1/(2*pi))*exp(-x[2]/2)* 
sqrt((x[1]/(x[2]-x[1])))}

That was taking quite a long time and I interrupted it. There were quite a few 
warnings of the sort
1: In sqrt((x[1]/(x[2] - x[1]))) : NaNs produced
2: In sqrt((x[1]/(x[2] - x[1]))) : NaNs produced

Thinking the NaNs might sabotage the integration process, I added a conditional 
to the section of that expression that was generating the NaNs. I don't really 
know whether NaN's are excluded from the summation process in adaptIntegrate:

 fun - function(x)   { (x[1]x[2])*(x[1]0)* (1/(2*pi))*exp(-x[2]/2)*
  if(x[1]x[2]){ 0 }else{ sqrt((x[1]/(x[2]-x[1])) 
)} }
 adaptIntegrate(fun, lower = c(0,0), upper =c(5, 6) )

I still didn't have the patience to wait for an answer, but I did plot the 
function:

fun2 - function(x,y)   { (xy)*(x0)* (1/(2*pi))*exp(-y/2)* sqrt((x/(y-x)))}
persp(outer(0:5, 0:6, fun2) )

So at least the function is finite over most of its domain.

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Externalptr class to character class from (web) scrape

2013-07-26 Thread Nick McClure
I'm hitting a wall. When I use the 'scrape' function from the package
'scrapeR' to get the pagesource from a web page, I do the following:
(as an example)

website.doc = parse(http://www.google.com;)

When I look at it, it seems fine:

website.doc[[1]]

This seems to have the information I need.  Then when I try to get it
into a character vector,

character.website = as.character(website.doc[[1]])

I get the error:

Error in as.vector(x, character) :
cannot coerce type 'externalptr' to vector of type 'character'

I'm trying very very hard to wrap my head around how to get this
external pointer to a character, but after reading many help files, I
cannot understand how to do this. Any ideas?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can't figure out why short figure won't work

2013-07-26 Thread Frank Harrell
[Sorry I can't quote past messages as I get mail on Nabble and have been 
told I can't reply through Nabble].


Thanks Kennel for recommended a narrowing range for usr y-limits.  That 
does help quite a bit.


But I found a disappointing aspect of the graphics system: When you 
change height= on the device call you have to change the y-coordinates 
to keep the same absolute vertical positioning.


Frank

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple interaction terms in GAMM-model

2013-07-26 Thread Gavin Simpson
On Thu, 2013-07-25 at 04:56 -0700, Jeroen wrote:
 Dear all,
 
 I am trying to correlate a variable tau1 to a set of other variables (x1,
 x2, x3, x4), taking into account an interaction with time ('doy') and place
 ('region'), and taking into account dependency of data in time per object
 ID. My dataset looks like:

snip /

 Since the data is dependent in time per objectID, I use a GAMM model with an
 autocorrelation function. Since each variable x1, x2, etc. is dependent on
 time and place, I should incorporate this as well. 
 
 Therefore I am wondering if the following gamm-model is correct for my
 situation:
 
 model - gamm( tau1 ~ te( x1, by= doy ) + te( x1, by= factor( region ) ) +
 ... + te( x4, by= doy ) + te( x4, by= factor( region ) ) + factor( region ),
 correlation= corAR1(form= ~ doy|objectID ), na.action= na.omit ).
 
 Does anyone know if this is ok? 

The model looks a little odd - the way you've written it you don't need
`te()` smooths. For the interaction with doy, I would try

te(x1, doy, bs = c(cr, cc))

assuming x1 and doy are continuous. This specifies cubic splines and
cyclic cubic splines respectively for x1 and doy. I use a cyclic spline
for doy as we might not expect Jan 1st and Dec 31st to be much
different. By separating the interaction between x1 and doy and the one
for x1 and region you are saying that the way the effect of x1 varies
through the year does not change between regions. Is that reasonable?

I think you need something for the objectID - a random effect or a fixed
effect will account for differences in mean values of taul between
objects. A random effect seems most useful otherwise you'll eat into
your degrees of freedom, but you have plenty of those.

The correlation part assumes that the residuals follow an AR(1) applied
within the objects, with each AR(1) having the same coefficient.
Autocorrelation decreases exponentially with lag/separation. It is a
reasonable start. Fitting will be much quicker without this. Could you
try fitting without and then extract the normalised residuals to check
for residual correlation?

I hope you have a lot of memory and a lot of spare time available; my
experience of fitting similarly sized (though not as complex - I had one
*very* long series) was that gamm() used large amounts of RAM (I had
16GB and most was used for a single model) and took many days to fit.
You may find the random effect for object fights with the AR(1) so you
could end up with an odd fit if it fits at all.

You are going to want to turn on verbosity when fitting so you can see
how the optimisation is proceeding. Something like

require(nlme)
## need a control object
ctrl - lmeControl(msVerbose = TRUE,
   maxIter = 400,
   msMaxIter = 400,
   niterEM = 0, ## this is VERY important for gamm()!
   tolerance = 1e-8,
   msTol = 1e-8,
   msMaxEval = 400)

then add `control = ctrl` to the `gamm()` call.

I would approach the expecting the model to fail to fit so be prepared
to simplify it etc.

Good luck,

G

 Or should I use a model which also includes terms like  te( x1 ) + ... +
 te( x4 ).
 And is the correlation function correct?
 
 Thanks so much!!
 
 Jeroen
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Multiple-interaction-terms-in-GAMM-model-tp4672297.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can't figure out why short figure won't work

2013-07-26 Thread Richard M. Heiberger
lattice graphics allow you to specify units in cm or inches, and then the
absolute vertical positioning can be controlled.

On Fri, Jul 26, 2013 at 12:58 PM, Frank Harrell f.harr...@vanderbilt.eduwrote:

 [Sorry I can't quote past messages as I get mail on Nabble and have been
 told I can't reply through Nabble].

 Thanks Kennel for recommended a narrowing range for usr y-limits.  That
 does help quite a bit.

 But I found a disappointing aspect of the graphics system: When you change
 height= on the device call you have to change the y-coordinates to keep the
 same absolute vertical positioning.

 Frank


 --
 Frank E Harrell Jr Professor and Chairman  School of Medicine
Department of Biostatistics Vanderbilt University

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to double integrate a function in R

2013-07-26 Thread David Winsemius

On Jul 26, 2013, at 9:33 AM, David Winsemius wrote:

 fun2 - function(x,y)   { (xy)*(x0)* (1/(2*pi))*exp(-y/2)* sqrt((x/(y-x)))}
 persp(outer(0:5, 0:6, fun2) )

There does seem to be some potential pathology at the edges of the range, 
Restricting it to x = 0.03 removes most of that concern. 

fun2 - function(x,y)   { (xy)*(x0)* (1/(2*pi))*exp(-y/2)* sqrt((x/(y-x)))}
persp(outer(seq(0.01,5,by=.01), seq(.02,6,by=.01), fun2) ,ticktype=detailed)


 fun - function(x)   { (x[1]x[2])*(x[1]0)* 
 (1/(2*pi))*exp(-x[2]/2)*if(x[1]x[2]){0}else{ sqrt((x[1]/(x[2]-x[1])) )}}
   adaptIntegrate(fun, lower = c(0.03,0.03), upper =c(5, 6), tol=1e-2 )
$integral
[1] 0.7605703

$error
[1] 0.00760384

$functionEvaluations
[1] 190859

$returnCode
[1] 0

I tried decreasing the tolerance to 1e-3 but the wait exceeds the patience I 
have allocated to the problem.

-- 
David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Externalptr class to character class from (web) scrape

2013-07-26 Thread Duncan Murdoch

On 26/07/2013 12:43 PM, Nick McClure wrote:

I'm hitting a wall. When I use the 'scrape' function from the package
'scrapeR' to get the pagesource from a web page, I do the following:
(as an example)

website.doc = parse(http://www.google.com;)

When I look at it, it seems fine:

website.doc[[1]]

This seems to have the information I need.  Then when I try to get it
into a character vector,

character.website = as.character(website.doc[[1]])

I get the error:

Error in as.vector(x, character) :
cannot coerce type 'externalptr' to vector of type 'character'

I'm trying very very hard to wrap my head around how to get this
external pointer to a character, but after reading many help files, I
cannot understand how to do this. Any ideas?


You should use str() in cases like this. When I look at 
str(website.doc[[1]]) (after producing website.doc with scrape(), not 
parse()), I see


 str(website.doc[[1]])
Classes 'HTMLInternalDocument', 'HTMLInternalDocument', 
'XMLInternalDocument', 'XMLAbstractDocument' externalptr
- attr(*, headers)= Named chr [1:2] HTMLHEADmeta 
http-equiv=\content-type\ 
content=\text/html;charset=utf-8\\nTITLE302 
Moved/TITLE/HEADBODY\nH1| __truncated__ /BODY/HTML
..- attr(*, names)= chr [1:2] HTMLHEADmeta 
http-equiv=\content-type\ 
content=\text/html;charset=utf-8\\nTITLE302 
Moved/TITLE/HEADBODY\nH1| __truncated__ /BODY/HTML


So it is an external pointer with a number of classes. One or more of 
those will have a print method. methods(print) will list all the print 
methods, and I see there's a (hidden) print.XMLInternalDocument method 
somewhere. Then


 getAnywhere(print.XMLInternalDocument)
A single object matching ‘print.XMLInternalDocument’ was found
It was found in the following places
registered S3 method for print from namespace XML
namespace:XML
with value

function (x, ...)
{
cat(as(x, character), \n)
}
environment: namespace:XML

shows that the as() generic should work, even though as.character() 
doesn't, and indeed as(website.doc[[1]], character) does display 
something.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Externalptr class to character class from (web) scrape

2013-07-26 Thread Duncan Murdoch

By the way, here's a quicker (but slightly more dangerous) way to find the

print.XMLInternalDocument


function:  just call debug(print) before printing the object.  Two steps 
take you to the method, and let you see what it's doing. The danger 
comes because now print() will always trigger the debugger, which can be 
a little confusing!  Remember undebug(print) at the end.


Duncan Murdoch


On 26/07/2013 1:26 PM, Duncan Murdoch wrote:

On 26/07/2013 12:43 PM, Nick McClure wrote:
 I'm hitting a wall. When I use the 'scrape' function from the package
 'scrapeR' to get the pagesource from a web page, I do the following:
 (as an example)

 website.doc = parse(http://www.google.com;)

 When I look at it, it seems fine:

 website.doc[[1]]

 This seems to have the information I need.  Then when I try to get it
 into a character vector,

 character.website = as.character(website.doc[[1]])

 I get the error:

 Error in as.vector(x, character) :
 cannot coerce type 'externalptr' to vector of type 'character'

 I'm trying very very hard to wrap my head around how to get this
 external pointer to a character, but after reading many help files, I
 cannot understand how to do this. Any ideas?

You should use str() in cases like this. When I look at
str(website.doc[[1]]) (after producing website.doc with scrape(), not
parse()), I see

   str(website.doc[[1]])
Classes 'HTMLInternalDocument', 'HTMLInternalDocument',
'XMLInternalDocument', 'XMLAbstractDocument' externalptr
- attr(*, headers)= Named chr [1:2] HTMLHEADmeta
http-equiv=\content-type\
content=\text/html;charset=utf-8\\nTITLE302
Moved/TITLE/HEADBODY\nH1| __truncated__ /BODY/HTML
..- attr(*, names)= chr [1:2] HTMLHEADmeta
http-equiv=\content-type\
content=\text/html;charset=utf-8\\nTITLE302
Moved/TITLE/HEADBODY\nH1| __truncated__ /BODY/HTML

So it is an external pointer with a number of classes. One or more of
those will have a print method. methods(print) will list all the print
methods, and I see there's a (hidden) print.XMLInternalDocument method
somewhere. Then

   getAnywhere(print.XMLInternalDocument)
A single object matching ‘print.XMLInternalDocument’ was found
It was found in the following places
registered S3 method for print from namespace XML
namespace:XML
with value

function (x, ...)
{
cat(as(x, character), \n)
}
environment: namespace:XML

shows that the as() generic should work, even though as.character()
doesn't, and indeed as(website.doc[[1]], character) does display
something.

Duncan Murdoch





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error: Line starting ' ...' is malformed!

2013-07-26 Thread Ross Boylan
A DESCRIPTION file begins with 0xFFFE and
$ file DESCRIPTION 
DESCRIPTION: Little-endian UTF-16 Unicode text, with CRLF, CR line terminators

I think it was created on Windows.

In R (2,15,1 on Debian GNU/Linux), using roxygen2, I get
 roxygenize(../GitHub/mice)
Error: Line starting '��P ...' is malformed!

Enter a frame number, or 0 to exit   

1: roxygenize(../GitHub/mice)
2: read.description(DESCRIPTION)
3: read.dcf(file)

Selection: 3
Called from: read.description(DESCRIPTION)
Browse[1] Q

I'm not sure if the first 2 characters after line starting ', which are octal 
377, 376, will survive email; I stripped them out of the subject line.

The files (DESCRIPTION isn't the only one) have also caused trouble for git 
(even on  Windows 7), since it thinks they are binary.

Any advice about what to do?

I'm reluctant to change the format of the files because it's not my package.

Ross Boylan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pairwise comparison between columns, logic

2013-07-26 Thread arun
Hi,
Using the example code without removing CEBPA:
gset- read.table(Names.txt,header=TRUE,stringsAsFactors=FALSE)
 temp1- read.table(Data.txt,header=TRUE,stringsAsFactors=FALSE)
lst1-split(temp1,temp1$Names)
mat1-combn(gset[,1],2) 
library(plyr)
lst2-lapply(split(mat1,col(mat1)),function(x){lst1[x][all(lapply(lst1[x],length)==2)]})
lst3-lapply(lst2[lapply(lst2,length)==2],function(x) {x1- 
join_all(x,by=patient_id,type=inner);x2-x1[patient_id];row.names(x2)-if(nrow(x1)!=0)
 paste(x1[,1],x1[,3],1:nrow(x1),sep=_) else NULL;x2 })
 Reduce(rbind,lst3)
#   patient_id
#DNMT3A_FLT3_1 LAML-AB-2811-TB
#DNMT3A_FLT3_2 LAML-AB-2816-TB
#DNMT3A_FLT3_3 LAML-AB-2818-TB
#DNMT3A_IDH1_1 LAML-AB-2802-TB
#DNMT3A_IDH1_2 LAML-AB-2822-TB
#DNMT3A_NPM1_1 LAML-AB-2802-TB
#DNMT3A_NPM1_2 LAML-AB-2809-TB
#DNMT3A_NPM1_3 LAML-AB-2811-TB
#DNMT3A_NPM1_4 LAML-AB-2816-TB
#DNMT3A_NRAS_1 LAML-AB-2816-TB
#FLT3_NPM1_1   LAML-AB-2811-TB
#FLT3_NPM1_2   LAML-AB-2812-TB
#FLT3_NPM1_3   LAML-AB-2816-TB
#FLT3_NRAS_1   LAML-AB-2816-TB
#IDH1_NPM1_1   LAML-AB-2802-TB
#NPM1_NRAS_1   LAML-AB-2816-TB




From your original dataset:
gset- read.table(SampleGenes.txt,header=TRUE,stringsAsFactors=FALSE) 
temp0- 
read.table(LAML-TB.final_analysis_set.txt,header=TRUE,stringsAsFactors=FALSE,sep=\t)
 
 temp1- temp0[,c(Hugo_Symbol,firehose_patient_id)]
 str(temp1)
#'data.frame':    2221 obs. of  2 variables:
# $ Hugo_Symbol    : chr  TBX15 TCHHL1 DNMT3A IDH1 ...
# $ firehose_patient_id: chr  LAML-AB-2802-TB LAML-AB-2802-TB 
LAML-AB-2802-TB LAML-AB-2802-TB ...
lst1-split(temp1,temp1$Hugo_Symbol) 
 length(lst1)
#[1] 1607
mat1-combn(gset[,1],2) # Generate all
lst2-lapply(split(mat1,col(mat1)),function(x){lst1[x][all(lapply(lst1[x],length)==2)]})
 
 length(lst2)
#[1] 105


 lst3-lapply(lst2[lapply(lst2,length)==2],function(x) {x1- 
join_all(x,by=firehose_patient_id,type=inner);x2-x1[firehose_patient_id];row.names(x2)-if(nrow(x1)!=0)
 paste(x1[,1],x1[,3],1:nrow(x1),sep=_) else NULL;x2 })
res-Reduce(rbind,lst3)
 nrow(res)
#[1] 234
head(res)
#    firehose_patient_id
#NPM1_FLT3_1 LAML-AB-2811-TB
#NPM1_FLT3_2 LAML-AB-2812-TB
#NPM1_FLT3_3 LAML-AB-2816-TB
#NPM1_FLT3_4 LAML-AB-2818-TB
#NPM1_FLT3_5 LAML-AB-2825-TB
#NPM1_FLT3_6 LAML-AB-2836-TB




Regarding your second question:
setdiff(gset[,1],unique(temp1[,1])) # CEBPA was not found in the temp1[,1]
#[1] CEBPA
mat2- combn(gset[-5,1],2)
vec1- apply(mat2,2,paste,collapse=_)
vec2-unique(gsub((.*\\_.*)\\_.*,\\1,row.names(res)))
setdiff(vec1,vec2)
 #[1] NPM1_TP53   NPM1_EZH2   NPM1_RUNX1  NPM1_ASXL1  NPM1_KDM6A 
 #[6] FLT3_TP53   FLT3_EZH2   FLT3_KRAS   FLT3_ASXL1  FLT3_KDM6A 
#[11] IDH1_TP53   IDH1_KRAS   NRAS_IDH2   NRAS_KRAS   NRAS_ASXL1 
#[16] NRAS_KDM6A  TP53_EZH2   TP53_IDH2   TP53_RUNX1  TP53_KRAS  
#[21] TP53_WT1    TP53_ASXL1  TP53_KDM6A  EZH2_IDH2   EZH2_WT1   
#[26] EZH2_ASXL1  EZH2_KDM6A  IDH2_TET2   IDH2_KDM6A  RUNX1_KDM6A
#[31] KRAS_WT1    KRAS_KDM6A  WT1_ASXL1   WT1_TET2    WT1_KDM6A  
#[36] ASXL1_TET2  ASXL1_KDM6A TET2_KDM6A 
A.K.


- Original Message -
From: arun smartpink...@yahoo.com
To: Manisha manishab...@gmail.com
Cc: R help r-help@r-project.org
Sent: Friday, July 26, 2013 11:18 AM
Subject: Re: Pairwise comparison between columns, logic

Hi Manisha,
I didn't run your dataset as I am on the way to college.  But, from the error 
reported, I think it will be due to some missing combinations in one of the 
dataset.  For ex. if you run my previous code without removing CEBPA:
ie.
mat1- combn(gset[,1],2)


lst2-lapply(split(mat1,col(mat1)),function(x) 
{x1-join_all(lst1[x],by=patient_id,type=inner);x1[patient_id] } )
#Error: All inputs to rbind.fill must be data.frames


So, check whether all the combinations are available in the `lst1`.

2. I will get back to you once I run it.
A.K.






From: Manisha manishab...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Friday, July 26, 2013 11:09 AM
Subject: Re: Pairwise comparison between columns, logic



Hi Arun,
I ran the script on a larger dataset and I seem to be running into this 
following error:
Error: All inputs to rbind.fill must be data.frames
after the step;
lst2-lapply(split(mat1,col(mat1)),function(x) 
{x1-join_all(lst1[x],by=firehose_patient_id,type=inner);x1[firehose_patient_id]})
 
I tried a few things to solve the issue but I am not able to. The format of 
input files and data are same as in the code you posted.
Could you suggest me something?
I have attached my input files on which I am trying to run the code. See 
attached code as well. Minor changes have been made by me.

2. I have another question. From your code how do also capture those pairs of 
names that donot have any common patient id?

Thanks again,
-M


On Fri, Jul 26, 2013 at 9:29 AM, arun smartpink...@yahoo.com wrote:

Hi M,
No problem.
Regards,
Arun




- Original Message -
From: manishab...@gmail.com manishab...@gmail.com
To: smartpink...@yahoo.com
Cc:
Sent: Friday, July 26, 2013 

Re: [R] Pairwise comparison between columns, logic

2013-07-26 Thread biobee
Hi Arun,

Many thanks for following this through. Based on your earlier  suggestion
with CEBPA, I also edited the code and now it seems work (I removed the non
exiting names).  I looked into the current code that you have sent below
and I  learnt a few more things. So thank you again.
Your skills in R is very good. Could you suggest some resources for advance
R programming. I am fairly comfortable with R, but need to keep abreast
with the latest R libraries that can make life a lot simpler.

Thanks again,
-M

On Fri, Jul 26, 2013 at 2:20 PM, arun kirshna [via R] 
ml-node+s789695n4672460...@n4.nabble.com wrote:

 Hi,
 Using the example code without removing CEBPA:
 gset- read.table(Names.txt,header=TRUE,stringsAsFactors=FALSE)
  temp1- read.table(Data.txt,header=TRUE,stringsAsFactors=FALSE)
 lst1-split(temp1,temp1$Names)
 mat1-combn(gset[,1],2)
 library(plyr)
 lst2-lapply(split(mat1,col(mat1)),function(x){lst1[x][all(lapply(lst1[x],length)==2)]})

 lst3-lapply(lst2[lapply(lst2,length)==2],function(x) {x1-
 join_all(x,by=patient_id,type=inner);x2-x1[patient_id];row.names(x2)-if(nrow(x1)!=0)
 paste(x1[,1],x1[,3],1:nrow(x1),sep=_) else NULL;x2 })
  Reduce(rbind,lst3)
 #   patient_id
 #DNMT3A_FLT3_1 LAML-AB-2811-TB
 #DNMT3A_FLT3_2 LAML-AB-2816-TB
 #DNMT3A_FLT3_3 LAML-AB-2818-TB
 #DNMT3A_IDH1_1 LAML-AB-2802-TB
 #DNMT3A_IDH1_2 LAML-AB-2822-TB
 #DNMT3A_NPM1_1 LAML-AB-2802-TB
 #DNMT3A_NPM1_2 LAML-AB-2809-TB
 #DNMT3A_NPM1_3 LAML-AB-2811-TB
 #DNMT3A_NPM1_4 LAML-AB-2816-TB
 #DNMT3A_NRAS_1 LAML-AB-2816-TB
 #FLT3_NPM1_1   LAML-AB-2811-TB
 #FLT3_NPM1_2   LAML-AB-2812-TB
 #FLT3_NPM1_3   LAML-AB-2816-TB
 #FLT3_NRAS_1   LAML-AB-2816-TB
 #IDH1_NPM1_1   LAML-AB-2802-TB
 #NPM1_NRAS_1   LAML-AB-2816-TB




 From your original dataset:
 gset- read.table(SampleGenes.txt,header=TRUE,stringsAsFactors=FALSE)
 temp0-
 read.table(LAML-TB.final_analysis_set.txt,header=TRUE,stringsAsFactors=FALSE,sep=\t)

  temp1- temp0[,c(Hugo_Symbol,firehose_patient_id)]
  str(temp1)
 #'data.frame':2221 obs. of  2 variables:
 # $ Hugo_Symbol: chr  TBX15 TCHHL1 DNMT3A IDH1 ...
 # $ firehose_patient_id: chr  LAML-AB-2802-TB LAML-AB-2802-TB
 LAML-AB-2802-TB LAML-AB-2802-TB ...
 lst1-split(temp1,temp1$Hugo_Symbol)
  length(lst1)
 #[1] 1607
 mat1-combn(gset[,1],2) # Generate all
 lst2-lapply(split(mat1,col(mat1)),function(x){lst1[x][all(lapply(lst1[x],length)==2)]})

  length(lst2)
 #[1] 105


  lst3-lapply(lst2[lapply(lst2,length)==2],function(x) {x1-
 join_all(x,by=firehose_patient_id,type=inner);x2-x1[firehose_patient_id];row.names(x2)-if(nrow(x1)!=0)
 paste(x1[,1],x1[,3],1:nrow(x1),sep=_) else NULL;x2 })
 res-Reduce(rbind,lst3)
  nrow(res)
 #[1] 234
 head(res)
 #firehose_patient_id
 #NPM1_FLT3_1 LAML-AB-2811-TB
 #NPM1_FLT3_2 LAML-AB-2812-TB
 #NPM1_FLT3_3 LAML-AB-2816-TB
 #NPM1_FLT3_4 LAML-AB-2818-TB
 #NPM1_FLT3_5 LAML-AB-2825-TB
 #NPM1_FLT3_6 LAML-AB-2836-TB




 Regarding your second question:
 setdiff(gset[,1],unique(temp1[,1])) # CEBPA was not found in the temp1[,1]
 #[1] CEBPA
 mat2- combn(gset[-5,1],2)
 vec1- apply(mat2,2,paste,collapse=_)
 vec2-unique(gsub((.*\\_.*)\\_.*,\\1,row.names(res)))
 setdiff(vec1,vec2)
  #[1] NPM1_TP53   NPM1_EZH2   NPM1_RUNX1  NPM1_ASXL1  NPM1_KDM6A
  #[6] FLT3_TP53   FLT3_EZH2   FLT3_KRAS   FLT3_ASXL1  FLT3_KDM6A
 #[11] IDH1_TP53   IDH1_KRAS   NRAS_IDH2   NRAS_KRAS   NRAS_ASXL1
 #[16] NRAS_KDM6A  TP53_EZH2   TP53_IDH2   TP53_RUNX1  TP53_KRAS
 #[21] TP53_WT1TP53_ASXL1  TP53_KDM6A  EZH2_IDH2   EZH2_WT1
 #[26] EZH2_ASXL1  EZH2_KDM6A  IDH2_TET2   IDH2_KDM6A
 RUNX1_KDM6A
 #[31] KRAS_WT1KRAS_KDM6A  WT1_ASXL1   WT1_TET2WT1_KDM6A
 #[36] ASXL1_TET2  ASXL1_KDM6A TET2_KDM6A
 A.K.


 - Original Message -
 From: arun [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4672460i=0

 To: Manisha [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4672460i=1

 Cc: R help [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4672460i=2

 Sent: Friday, July 26, 2013 11:18 AM
 Subject: Re: Pairwise comparison between columns, logic

 Hi Manisha,
 I didn't run your dataset as I am on the way to college.  But, from the
 error reported, I think it will be due to some missing combinations in one
 of the dataset.  For ex. if you run my previous code without removing
 CEBPA:
 ie.
 mat1- combn(gset[,1],2)


 lst2-lapply(split(mat1,col(mat1)),function(x)
 {x1-join_all(lst1[x],by=patient_id,type=inner);x1[patient_id] } )
 #Error: All inputs to rbind.fill must be data.frames


 So, check whether all the combinations are available in the `lst1`.

 2. I will get back to you once I run it.
 A.K.





 
 From: Manisha [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4672460i=3

 To: arun [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4672460i=4

 Sent: Friday, July 26, 2013 11:09 AM
 Subject: Re: Pairwise comparison between columns, logic



 Hi Arun,
 I ran the script on a larger dataset and I seem to 

Re: [R] Duplicated function with conditional statement

2013-07-26 Thread Uwe Ligges



On 25.07.2013 21:05, vanessa van der vaart wrote:

Hi everybody,,
I have a question about R function duplicated(). I have spent days try to
figure this out,but I cant find any solution yet. I hope somebody can help
me..
this is my data:

subj=c(1,1,1,2,2,3,3,3,4,4)
response=c('sample','sample','buy','sample','buy','sample','
sample','buy','sample','buy')
product=c(1,2,3,2,2,3,2,1,1,4)
tt=data.frame(subj, response, product)

the data look like this:

  subj response product
1 1   sample   1
2 1   sample   2
3 1  buy  3
4 2   sample   2
5 2 buy   2
6 3   sample   3
7 3   sample   2
8 3 buy   1
9 4  sample   1
10   4   buy4

I want to create new  column based on the value on response and product
column. if the value on product is duplicated, then  the value on new column
is 1, otherwise is 0.



According to your description:

tt$newcolumn - as.integer(duplicated(tt$product)  tt$response==buy)

which is different from what you show us below, where I cannot derive 
any systematic rule from.


Uwe Ligges


but I want to add conditional statement that the value on product column
will only be considered as duplicated if the value on response column is
'buy'.
for illustration, the table should look like this:

subj response product newcolumn
1 1   sample   1  0
2 1   sample   2  0
3 1  buy  3  0
4 2   sample   2  0
5 2 buy   2  0
6 3   sample   3  1
7 3   sample   2   1
8 3 buy   1   0
9 4  sample   11
10   4   buy   4 0


can somebody help me?
any help will be appreciated.
I am new in this mailing list, so forgive me in advance, If I did not  ask
the question appropriately.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SpatialPolygonsDataFrame and unique()

2013-07-26 Thread Nicola Rossi
Thank you very much for the help, that's what I was looking for!

Also my apologies for sending an html, I thought I turned it off
(hopefully this time works).

Best regards,

Nicola

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] add different regression lines for groups on ggplot

2013-07-26 Thread Ye Lin
Hey All,

I need to apply different regression lines to different group on my ggplot,
and here is the code I use:

qplot(x=Var1,y=Var2,data=df,color=SiteID,group=SiteID)+geom_point()+geom_smooth(method='lm',formula=log(y)~I(1/x),se=FALSE,size=2)

However the regression for different groups is as below:

AL1/AL2: log(y)~I(1/x)
AL3: log(y)~log(x)

How can I apply each regression equation on the same ggplot?

Also I have an issue that if I use the code above, the regression lines are
not overlapped on top of my data points.

Thanks for your help!
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] readTiff - Sorry can't handle images with 32-bit samples

2013-07-26 Thread wwreith
I tried using readTiff() and got the error message Sorry can't handle images
with 32-bit samples

line of code

x - readTiff(C:/Users/550062/Desktop/Data/example1.tif)

So far I have not had any luck finding this error message on google. Any
guess at what it means and how to get the code to work?

Thanks!



--
View this message in context: 
http://r.789695.n4.nabble.com/readTiff-Sorry-can-t-handle-images-with-32-bit-samples-tp4672465.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Duplicated function with conditional statement

2013-07-26 Thread David Winsemius

On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote:

 
 
 On 25.07.2013 21:05, vanessa van der vaart wrote:
 Hi everybody,,
 I have a question about R function duplicated(). I have spent days try to
 figure this out,but I cant find any solution yet. I hope somebody can help
 me..
 this is my data:
 
 subj=c(1,1,1,2,2,3,3,3,4,4)
 response=c('sample','sample','buy','sample','buy','sample','
 sample','buy','sample','buy')
 product=c(1,2,3,2,2,3,2,1,1,4)
 tt=data.frame(subj, response, product)
 
 the data look like this:
 
  subj response product
 1 1   sample   1
 2 1   sample   2
 3 1  buy  3
 4 2   sample   2
 5 2 buy   2
 6 3   sample   3
 7 3   sample   2
 8 3 buy   1
 9 4  sample   1
 10   4   buy4
 
 I want to create new  column based on the value on response and product
 column. if the value on product is duplicated, then  the value on new column
 is 1, otherwise is 0.
 
 
 According to your description:
 

Agree that the description did not match the output. I tried to match the 
output using a rule that could be expressed as: 

 if( a buy- associated product value precedes the current product 
value){1}else{0}

-- 
David.

 tt$newcolumn - as.integer(duplicated(tt$product)  tt$response==buy)
 
 which is different from what you show us below, where I cannot derive any 
 systematic rule from.
 
 Uwe Ligges
 
 but I want to add conditional statement that the value on product column
 will only be considered as duplicated if the value on response column is
 'buy'.
 for illustration, the table should look like this:
 
 subj response product newcolumn
 1 1   sample   1  0
 2 1   sample   2  0
 3 1  buy  3  0
 4 2   sample   2  0
 5 2 buy   2  0
 6 3   sample   3  1
 7 3   sample   2   1
 8 3 buy   1   0
 9 4  sample   11
 10   4   buy   4 0
 
 
 can somebody help me?
 any help will be appreciated.
 I am new in this mailing list, so forgive me in advance, If I did not  ask
 the question appropriately.
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Duplicated function with conditional statement

2013-07-26 Thread David Winsemius

On Jul 26, 2013, at 2:06 PM, David Winsemius wrote:

 
 On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote:
 
 
 
 On 25.07.2013 21:05, vanessa van der vaart wrote:
 Hi everybody,,
 I have a question about R function duplicated(). I have spent days try to
 figure this out,but I cant find any solution yet. I hope somebody can help
 me..
 this is my data:
 
 subj=c(1,1,1,2,2,3,3,3,4,4)
 response=c('sample','sample','buy','sample','buy','sample','
 sample','buy','sample','buy')
 product=c(1,2,3,2,2,3,2,1,1,4)
 tt=data.frame(subj, response, product)
 
 the data look like this:
 
 subj response product
 1 1   sample   1
 2 1   sample   2
 3 1  buy  3
 4 2   sample   2
 5 2 buy   2
 6 3   sample   3
 7 3   sample   2
 8 3 buy   1
 9 4  sample   1
 10   4   buy4
 
 I want to create new  column based on the value on response and product
 column. if the value on product is duplicated, then  the value on new column
 is 1, otherwise is 0.
 
 
 According to your description:
 
 
 Agree that the description did not match the output. I tried to match the 
 output using a rule that could be expressed as: 
 
 if( a buy- associated product value precedes the current product 
 value){1}else{0}
 

So this delivers the specified output:

tt$rown - rownames(tt)
as.numeric ( apply(tt, 1, function(x) { 
 x['product'] %in% tt[ rownames(tt)  x['rown']  tt$response == buy, 
product]  } ) )

# [1] 0 0 0 0 0 1 1 0 1 0

 -- 
 David.
 
 tt$newcolumn - as.integer(duplicated(tt$product)  tt$response==buy)
 
 which is different from what you show us below, where I cannot derive any 
 systematic rule from.
 
 Uwe Ligges
 
 but I want to add conditional statement that the value on product column
 will only be considered as duplicated if the value on response column is
 'buy'.
 for illustration, the table should look like this:
 
 subj response product newcolumn
 1 1   sample   1  0
 2 1   sample   2  0
 3 1  buy  3  0
 4 2   sample   2  0
 5 2 buy   2  0
 6 3   sample   3  1
 7 3   sample   2   1
 8 3 buy   1   0
 9 4  sample   11
 10   4   buy   4 0
 
 
 can somebody help me?
 any help will be appreciated.
 I am new in this mailing list, so forgive me in advance, If I did not  ask
 the question appropriately.
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius
 Alameda, CA, USA
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] readTiff - Sorry can't handle images with 32-bit samples

2013-07-26 Thread Jeff Newmiller
Try reading a different file. The error message says the existing code cannot 
read that file.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

wwreith reith_will...@bah.com wrote:
I tried using readTiff() and got the error message Sorry can't handle
images
with 32-bit samples

line of code

x - readTiff(C:/Users/550062/Desktop/Data/example1.tif)

So far I have not had any luck finding this error message on google.
Any
guess at what it means and how to get the code to work?

Thanks!



--
View this message in context:
http://r.789695.n4.nabble.com/readTiff-Sorry-can-t-handle-images-with-32-bit-samples-tp4672465.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] readTiff - Sorry can't handle images with 32-bit samples

2013-07-26 Thread Peter Langfelder
Disclaimer: I haven't seen your tif file and I know nothing about
readTiff... but here go some general comments.

TIF files can use different bit depths (number of bits to store each
pixel (or each color for each pixel). Most common software outputs 8-
or 16-bits, but your file probably has a higher bit depth of 32 bits
per sample. Apparently readTiff cannot handle such bit depth.

You may need to convert the 32-bit delth file(s) into 16-bit depth (or
whatever readTiff can handle). My suggestion would be to look at
ImageMagick, but you may also be able to use some image editing
applications to do that,

Peter

On Fri, Jul 26, 2013 at 1:51 PM, wwreith reith_will...@bah.com wrote:
 I tried using readTiff() and got the error message Sorry can't handle images
 with 32-bit samples

 line of code

 x - readTiff(C:/Users/550062/Desktop/Data/example1.tif)

 So far I have not had any luck finding this error message on google. Any
 guess at what it means and how to get the code to work?

 Thanks!



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/readTiff-Sorry-can-t-handle-images-with-32-bit-samples-tp4672465.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is it possible (/reasonable) to write an as.RefClass function for R list class?

2013-07-26 Thread Tal Galili
Hello all.


I am interested in creating a mutable list object in R.

Ideally I would do it through something like this:

x - list(a = list(1:4), b = miao, c = 4:1)
x_RC - as.RefClass(x)
attr(x_RC[[2]], I am) - string
x_RC[[3]] - x_RC[[3]]+1
x_new - as.list(x_RC)

Is that reasonably possible to create? Or does that make little/no-sense?

Thanks.




Contact
Details:---
Contact me: tal.gal...@gmail.com |
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is it possible (/reasonable) to write an as.RefClass function for R list class?

2013-07-26 Thread Jeff Newmiller
Reference classes are implemented with environment objects, which are mutable. 
Once you convert to list, the converted object is not mutable, since there is 
no such thing as a mutable list. So I would say your request is not possible.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Tal Galili tal.gal...@gmail.com wrote:
Hello all.


I am interested in creating a mutable list object in R.

Ideally I would do it through something like this:

x - list(a = list(1:4), b = miao, c = 4:1)
x_RC - as.RefClass(x)
attr(x_RC[[2]], I am) - string
x_RC[[3]] - x_RC[[3]]+1
x_new - as.list(x_RC)

Is that reasonably possible to create? Or does that make
little/no-sense?

Thanks.




Contact
Details:---
Contact me: tal.gal...@gmail.com |
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew)
|
www.r-statistics.com (English)
--

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Boxcox transformation error

2013-07-26 Thread Miller Ruiz
Hello

I'm trying to run the script below for making a boxcox transformation of
some variables contained on  an excel file, but i can't get it. I ever have
the same message :
error : $ operator is invalid for atomic vectors

One of the names of the variables is Ec30 and it's the variable I put as
example.

require(RODBC)
require(fBasics)
require(e1071)
require(MASS)
setwd(C:\\estadisticafijo)
dir()
temp=odbcConnectExcel(prueba35r)
rs-sqlFetch(temp,Hoja1)
close(temp)
fix(rs)
boxcox(rs$Ec30)

Thaks a lot for your help.

Miller Ruiz
Assistant Research
Cenipalma
Colombia

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GGplot 2 – cannot get histogram and box plot axis to match.

2013-07-26 Thread John W. Clow
Dear Thierry,

Thank you for your help. My code now works the way I wanted it to.

To other readers out there, I think the reason why the two functions work 
differently is because extend_limits only ensures that the axis includes the 
coordinates supplied, and does not guarantee that limits will be the 
coordinates you supplied to the function (they might extend further in some 
graphs like the heatmap)

Thierry, if any of the above is wrong or incomplete, feel free to correct me or 
add more details.

John Clow
UCSB Student

From: ONKELINX, Thierry [thierry.onkel...@inbo.be]
Sent: Friday, July 26, 2013 1:03 AM
To: John W. Clow; r-help@r-project.org
Subject: RE: GGplot 2 – cannot get histogram and box plot axis to match.

Dear John,

Use xlim() and ylim() instead of expand_limits()

library(ggplot2)

#sample data from ggplot2
data(Cars93, package = MASS)
dataSet - Cars93

#variables to calculate the range to extend the axis dataVector - 
unlist(dataSet[,MPG.city])

dataRange - diff(range(dataSet$MPG.city))

graphRange - c(min(dataSet$MPG.city) - dataRange/5,
max(dataSet$MPG.city) + dataRange/5)

#making the box plot
theBoxPlot - ggplot(dataSet,aes_string(x = MPG.city, y = MPG.city))

theBoxPlot -
  theBoxPlot  + geom_boxplot() + coord_flip() + ylim(limits = graphRange)
print(theBoxPlot)


#making the histogram
thePlot - ggplot(dataSet,aes_string(x = MPG.city))
thePlot - thePlot + geom_histogram()  + xlim(graphRange)
print(thePlot)



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
John W. Clow
Verzonden: donderdag 25 juli 2013 20:03
Aan: r-help@r-project.org
Onderwerp: [R] GGplot 2 – cannot get histogram and box plot axis to match.

Problem:
I am trying to get the histogram and box plot x axis to match. I’ve tried using 
the expand_limits function to make the axis match but that didn’t make the axis 
match. The histogram’s axis are still consistently larger than the ones for the 
box plot (though the function did help). Does anyone have a suggestion as to 
what I should do instead?


Background:
I am building a Shiny app that displays a histogram below a bar chart for a set 
of data that a user uploads to the app. If you want to see the app, go here 
http://spark.rstudio.com/jclow/Archive20130725HistogramApp/
To run the app, select “Use Sample Data” , then select  “MPG.city” under choose 
a column, then finally select box plot.


Sample code:
Below is a snippet of my code to demonstrate the problems I have.

library(ggplot2)

#sample data from ggplot2
data(Cars93, package = MASS)
dataSet - Cars93

#variables to calculate the range to extend the axis dataVector - 
unlist(dataSet[,MPG.city])

dataRange - max(dataVector) - min(dataVector)

graphRange - c(min(dataVector) - dataRange/5,
max(dataVector) + dataRange/5)

#making the box plot
theBoxPlot - ggplot(dataSet,aes_string(x = MPG.city,y = MPG.city))

theBoxPlot = theBoxPlot  + geom_boxplot() + expand_limits(y= graphRange) + 
coord_flip()
print(theBoxPlot)


#making the histogram
thePlot - ggplot(dataSet,aes_string(x = MPG.city)) thePlot -thePlot + 
geom_histogram()  + expand_limits(x= graphRange)

print(thePlot)


Thank you for taking the time to read this.

John Clow
UCSB Student

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en 
binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is 
door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the 
writer and may not be regarded as stating an official position of INBO, as long 
as the message is not confirmed by a duly signed document.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

[R] matching columns of model matrix to those in original data.frame

2013-07-26 Thread Ross Boylan
What is a reliable way to go from a column of a model matrix back to the column 
(or columns) of the original data source used to make the model 
matrix?  I can come up with a method that seems to work, but I don't see 
guarantees in the documentation that it will.

In particular, does the order of the term.labels match the order of columns for 
factors in a terms object?  The documentation says the model.matrix 
assign attribute uses the ordering of terms.labels.

If anyone can tell me if this approach is reliable, or of one that is, I would 
appreciate it.

Ross Boylan

Proposed function and a little example follow.

# return a vector v such that data[,v[i]] contributed to mm[,i]
# mm = model matrix produced by
# form = formula
# data = data
reverse.map - function(mm, form, data){
tt - terms(form, data=data)
ttf - attr(tt, factors)
mmi - attr(mm, assign)
# this depends on assign using same order as columns of factors
# entries in mmi that are 0 (the intercept) are silently dropped
ttf2 - ttf[,mmi]
# take the first row that contributes
r - apply(ttf2, 2, function(is) rownames(ttf)[is  0][1])
match(r, colnames(data))
}

 ### experiment with mapping model matrix to original columns
 df - sp2b[sample(nrow(sp2b), 8), c(pEthnic, ethnic_sg, rac_gay)]
 form - ~pEthnic+ethnic_sg*rac_gay
 mm - model.matrix(form, df)
 tt - terms(form, data=df)
 ttf - attr(tt, factors)
 mmi - attr(mm, assign)
 df
  pEthnic ethnic_sg rac_gay
1366 Afr Amer  Afr Amer3.25
3052 Afr Amer  Afr Amer1.75
3012   Latino  Afr Amer2.00
369  Afr Amer  Asian/PI2.00
529 White  Asian/PI2.00
194  Asian/PI  Asian/PI3.25
126 White  Asian/PI2.25
2147   LatinoLatino2.75
 colnames(mm)
 [1] (Intercept)   pEthnicAsian/PI  
 [3] pEthnicLatino pEthnicOther 
 [5] pEthnicWhite  ethnic_sgAsian/PI
 [7] ethnic_sgLatino   rac_gay  
 [9] ethnic_sgAsian/PI:rac_gay ethnic_sgLatino:rac_gay  
 ttf  # term factors
  pEthnic ethnic_sg rac_gay ethnic_sg:rac_gay
pEthnic 1 0   0 0
ethnic_sg   0 1   0 1
rac_gay 0 0   1 1
 mmi  #model matrix assign
 [1] 0 1 1 1 1 2 2 3 4 4
 reverse.map(mm, form, df)
[1] 1 1 1 1 2 2 3 2 2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Duplicated function with conditional statement

2013-07-26 Thread arun


On some slightly different datasets:
tt1-structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 
6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, 
2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
1L, 2L, 1L), .Label = c(buy, sample), class = factor), 
    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 4, 2, 2, 4, 5, 
    5, 4, 3, 4, 5, 4, 2)), .Names = c(subj, response, product
), class = data.frame, row.names = c(NA, 22L))

tt2- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 
6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, 
2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 
1L, 2L, 2L), .Label = c(buy, sample), class = factor), 
    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 4, 5, 1, 4, 
    2, 3, 3, 2, 5, 3, 4)), .Names = c(subj, response, product
), class = data.frame, row.names = c(NA, 22L))

tt3- structure(list(subj = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 
6, 6, 6, 7, 7, 7, 8, 8, 8), response = structure(c(2L, 2L, 1L, 
2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 
1L, 1L, 2L), .Label = c(buy, sample), class = factor), 
    product = c(1, 2, 3, 2, 2, 3, 2, 1, 1, 4, 1, 1, 3, 5, 2, 
    2, 2, 2, 4, 3, 2, 5)), .Names = c(subj, response, product
), class = data.frame, row.names = c(NA, 22L))


#Tried David's solution:
tt1$rown - rownames(tt1)
as.numeric ( apply(tt1, 1, function(x) {
    x['product'] %in% tt1[ rownames(tt1)  x['rown']  tt1$response == buy, 
product]  } ) )
  #gave inconsistent results especially since the first 10 rows were from `tt`
# [1] 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 1 1

#similarly for `tt2` and `tt3`.


##Created this function.  It seems to work in the tested cases, though it is 
not tested extensively.
fun1- function(dat,colName,newColumn){
  indx- which(dat[,colName]==buy)
  dat[,newColumn]-0
  dat[unlist(lapply(seq_along(indx),function(i){
            x1- if(i==length(indx)){
                seq(indx[i],nrow(dat))
             }
            else if((indx[i+1]-indx[i])==1){
            indx[i]
            }
            else {
            seq(indx[i]+1,indx[i+1]-1)
             }
            x2- dat[unique(c(indx[i:1],x1)),]
            x3- subset(x2,response==sample)
            x4- subset(x2,response==buy)
            if(nrow(x3)!=0) {
                row.names(x3)[x3$product%in% x4$product]
                       }
                                    
            })),newColumn]-1
    dat

    }
fun1(tt,response,newCol)
#   subj response product rown newCol
#1 1   sample   1    1  0
#2 1   sample   2    2  0
#3 1  buy   3    3  0
#4 2   sample   2    4  0
#5 2  buy   2    5  0
#6 3   sample   3    6  1
#7 3   sample   2    7  1
#8 3  buy   1    8  0
#9 4   sample   1    9  1
#10    4  buy   4   10  0

fun1(tt1,response,newCol)
#   subj response product newCol
#1 1   sample   1  0
#2 1   sample   2  0
#3 1  buy   3  0
#4 2   sample   2  0
#5 2  buy   2  0
#6 3   sample   3  1
#7 3   sample   2  1
#8 3  buy   1  0
#9 4   sample   1  1
#10    4  buy   4  0
#11    5  buy   4  0
#12    5   sample   2  1
#13    5  buy   2  0
#14    6  buy   4  0
#15    6   sample   5  0
#16    6   sample   5  0
#17    7   sample   4  1
#18    7  buy   3  0
#19    7  buy   4  0
#20    8  buy   5  0
#21    8   sample   4  1
#22    8  buy   2  0
#Also
 fun1(tt2,response,newCol)
 fun1(tt3,response,newCol)
A.K.

P.S.  Below is OP's clarification regarding the conditional statement in a 
private message:

I am sorry i didnt question it very clearly, let me change the 
conditional statement, I hope you can understand. i will explain by 
example

as you can see, almost every number is duplicated, but only in row 6th,7th,and 
9th the value on column is 1.

on row4th, the value is duplicated( 2 already occurred on 2nd row),but 
since the value is considered as duplicated only if the value is 
duplicated where the response is 'buy' than the value on column, on 
row4th still zero. 

On row 6th, where the value product column is 3. 3 is already occurred 
in 3rd row where the value on response is 'buy', so the value on column 
should be 1

I hope it can understand the conditional statement. 








- Original Message -
From: David Winsemius dwinsem...@comcast.net
To: David Winsemius dwinsem...@comcast.net
Cc: R-help@r-project.org; Uwe Ligges lig...@statistik.tu-dortmund.de
Sent: Friday, July 26, 2013 5:16 PM
Subject: Re: [R] Duplicated function with conditional statement


On Jul 26, 2013, at 2:06 PM, David Winsemius wrote:

 
 On Jul 26, 2013, at 11:51 AM, Uwe Ligges wrote:
 
 
 
 On 

Re: [R] Combine multiple random forests contained in a list

2013-07-26 Thread arun
HI,
Using the example in ?combine
library(randomForest)
rf1 - randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
  rf2 - randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
  rf3 - randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
  rf.all - combine(rf1, rf2, rf3)
lst1- list(rf1,rf2,rf3)

rf.allL- do.call(`combine`,lst1)
#or
rf.allL- Reduce(`combine,lst1)
 identical(rf.all,rf.allL)
#[1] TRUE
A.K.


Is there a quick and easy way to pass randomForest objects contained in a list 
into the combine() function? 

As a result of calling randomForest through lapply(), I now have 10 
randomForests in a list (rfors) 

I want to combine all 10 of them. Understandably combine(rfors) 
doesn't work as it doesn't recognise the individual forests within the 
list. I have spend quite sometime messing around with unlist(), lapply()
 and apply() to try and extract the information in a suitable format but
 to no avail. The only thing that works is combine(rfors[[1]], 
rfors[[2]] ...etc). 

This is a bit cumbersome though, not least because the number of
 random forests I'll need to combine is likely to change. Any sleek and 
elegant solution to this someone can suggest? 

Thanks in advance for any help. 

Anna

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with ldpaths and new R

2013-07-26 Thread Erin Hodgess
Hello!

I have just installed R on an Ubutnu machine (13.04) and keep getting the
following:

erin@erin-Lenovo-IdeaPad-Y480:~$ R
/usr/bin/R: line 236: /usr/lib/R/etc/ldpaths: No such file or directory

R version 3.0.1 (2013-05-16) -- Good Sport
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

 install.packages(pbdMPI,depen=TRUE)
Installing package into ‘/home/erin/lib/R/library’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
also installing the dependency ‘rlecuyer’

trying URL '
http://cran.revolutionanalytics.com/src/contrib/rlecuyer_0.3-3.tar.gz'
Content type 'application/x-gzip' length 11756 bytes (11 Kb)
opened URL
==
downloaded 11 Kb

trying URL '
http://cran.revolutionanalytics.com/src/contrib/pbdMPI_0.1-8.tar.gz'
Content type 'application/x-gzip' length 417547 bytes (407 Kb)
opened URL
==
downloaded 407 Kb

/usr/lib/R/bin/R: line 140: /usr/lib/R/etc/ldpaths: No such file or
directory
/usr/lib/R/bin/R: line 236: /usr/lib/R/etc/ldpaths: No such file or
directory
Error in file(con, r) : cannot open the connection
Calls: Anonymous - sub - grep - readLines - file
In addition: Warning message:
In file(con, r) :
  cannot open file '/usr/lib/R/etc/Makeconf': No such file or directory
/usr/lib/R/bin/R: line 140: /usr/lib/R/etc/ldpaths: No such file or
directory
/usr/lib/R/bin/R: line 236: /usr/lib/R/etc/ldpaths: No such file or
directory
Error in file(con, r) : cannot open the connection
Calls: Anonymous - sub - grep - readLines - file
In addition: Warning message:
In file(con, r) :
  cannot open file '/usr/lib/R/etc/Makeconf': No such file or directory

The downloaded source packages are in
‘/tmp/RtmpAtAxNL/downloaded_packages’
Warning messages:
1: In install.packages(pbdMPI, depen = TRUE) :
  installation of package ‘rlecuyer’ had non-zero exit status
2: In install.packages(pbdMPI, depen = TRUE) :
  installation of package ‘pbdMPI’ had non-zero exit status



I uninstalled it, deleted the /etc/R and /usr/lib/R directories, and
re-installed.  (Actually, I followed that process twice).

I'm still stuck.  Does anyone have any suggestions, please?

Thanks,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with ldpaths and new R

2013-07-26 Thread David Winsemius

On Jul 26, 2013, at 9:45 PM, Erin Hodgess wrote:

 Hello!
 
 I have just installed R on an Ubutnu machine (13.04)

Did you follow directions given here:

http://cran.us.r-project.org/bin/linux/ubuntu/

Users who need to compile R packages from source [e.g. package maintainers, or 
anyone installing packages with install.packages()] should also install the 
r-base-dev package:

   sudo apt-get install r-base-dev


I ask because one ot the other directions that you did not follow was:

The best place to report problems with these packages or ask R questions 
specific to Ubuntu is the R-SIG-Debian mailing list. See

   https://stat.ethz.ch/mailman/listinfo/r-sig-debian


-- 
David.

 and keep getting the
 following:
 
 erin@erin-Lenovo-IdeaPad-Y480:~$ R
 /usr/bin/R: line 236: /usr/lib/R/etc/ldpaths: No such file or directory
 
 R version 3.0.1 (2013-05-16) -- Good Sport
 Copyright (C) 2013 The R Foundation for Statistical Computing
 Platform: x86_64-pc-linux-gnu (64-bit)
 
 R is free software and comes with ABSOLUTELY NO WARRANTY.
 You are welcome to redistribute it under certain conditions.
 Type 'license()' or 'licence()' for distribution details.
 
  Natural language support but running in an English locale
 
 R is a collaborative project with many contributors.
 Type 'contributors()' for more information and
 'citation()' on how to cite R or R packages in publications.
 
 Type 'demo()' for some demos, 'help()' for on-line help, or
 'help.start()' for an HTML browser interface to help.
 Type 'q()' to quit R.
 
 install.packages(pbdMPI,depen=TRUE)
 Installing package into Œ/home/erin/lib/R/library‚
 (as Œlib‚ is unspecified)
 --- Please select a CRAN mirror for use in this session ---
 also installing the dependency Œrlecuyer‚
 
 trying URL '
 http://cran.revolutionanalytics.com/src/contrib/rlecuyer_0.3-3.tar.gz'
 Content type 'application/x-gzip' length 11756 bytes (11 Kb)
 opened URL
 ==
 downloaded 11 Kb
 
 trying URL '
 http://cran.revolutionanalytics.com/src/contrib/pbdMPI_0.1-8.tar.gz'
 Content type 'application/x-gzip' length 417547 bytes (407 Kb)
 opened URL
 ==
 downloaded 407 Kb
 
 /usr/lib/R/bin/R: line 140: /usr/lib/R/etc/ldpaths: No such file or
 directory
 /usr/lib/R/bin/R: line 236: /usr/lib/R/etc/ldpaths: No such file or
 directory
 Error in file(con, r) : cannot open the connection
 Calls: Anonymous - sub - grep - readLines - file
 In addition: Warning message:
 In file(con, r) :
  cannot open file '/usr/lib/R/etc/Makeconf': No such file or directory
 /usr/lib/R/bin/R: line 140: /usr/lib/R/etc/ldpaths: No such file or
 directory
 /usr/lib/R/bin/R: line 236: /usr/lib/R/etc/ldpaths: No such file or
 directory
 Error in file(con, r) : cannot open the connection
 Calls: Anonymous - sub - grep - readLines - file
 In addition: Warning message:
 In file(con, r) :
  cannot open file '/usr/lib/R/etc/Makeconf': No such file or directory
 
 The downloaded source packages are in
Œ/tmp/RtmpAtAxNL/downloaded_packages‚
 Warning messages:
 1: In install.packages(pbdMPI, depen = TRUE) :
  installation of package Œrlecuyer‚ had non-zero exit status
 2: In install.packages(pbdMPI, depen = TRUE) :
  installation of package ŒpbdMPI‚ had non-zero exit status
 
 
 
 I uninstalled it, deleted the /etc/R and /usr/lib/R directories, and
 re-installed.  (Actually, I followed that process twice).
 
 I'm still stuck.  Does anyone have any suggestions, please?
 
 Thanks,
 Erin
 
 
 -- 
 Erin Hodgess
 Associate Professor
 Department of Computer and Mathematical Sciences
 University of Houston - Downtown
 mailto: erinm.hodg...@gmail.com
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combine multiple random forests contained in a list

2013-07-26 Thread Bert Gunter
I would say that the use of Reduce in this context is bad practice.

from ?Reduce :

Reduce uses a binary function to successively combine the elements of
a given vector and a possibly given initial value.

combine() is obviously not a binary function. do.call() seems to be
THE appropriate idiom.

-- Bert

On Fri, Jul 26, 2013 at 9:26 PM, arun smartpink...@yahoo.com wrote:
 HI,
 Using the example in ?combine
 library(randomForest)
 rf1 - randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
   rf2 - randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
   rf3 - randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
   rf.all - combine(rf1, rf2, rf3)
 lst1- list(rf1,rf2,rf3)

 rf.allL- do.call(`combine`,lst1)
 #or
 rf.allL- Reduce(`combine,lst1)
  identical(rf.all,rf.allL)
 #[1] TRUE
 A.K.


 Is there a quick and easy way to pass randomForest objects contained in a 
 list into the combine() function?

 As a result of calling randomForest through lapply(), I now have 10 
 randomForests in a list (rfors)

 I want to combine all 10 of them. Understandably combine(rfors)
 doesn't work as it doesn't recognise the individual forests within the
 list. I have spend quite sometime messing around with unlist(), lapply()
  and apply() to try and extract the information in a suitable format but
  to no avail. The only thing that works is combine(rfors[[1]],
 rfors[[2]] ...etc).

 This is a bit cumbersome though, not least because the number of
  random forests I'll need to combine is likely to change. Any sleek and
 elegant solution to this someone can suggest?

 Thanks in advance for any help.

 Anna

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.