Re: [R] Using R to analyze multiple MRI studies
Yes-I did look through the CRAN view and could not find any package that featured a function whereby an MRI set was transformed into Talairach or MNI space. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using R to analyze multiple MRI studies
I need to analyze multiple T1 contrast enhanced MRI studies from different patients. They are all in DICOM format. I see that there are different packages for loading individual studies in DICOM format, however I have had limited luck so far researching how the different studies can be tranformed into MNI or Talairach space. Is there an R-implementation of this? Best, M [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plot survival analysis with time dependent variables
Dear all, Is there an implementation of Simon Makuch method of plotting the survival function with time-dependent variables. I´m only able to find event.chart in Hmisc for the purpose and I would prefer the Simon and Makuch method. I believe stata has it implemented for this purpose, but I cannot find it on CRAN. Simon R, Makuch RW. A non-parametric graphical representation of the relationship between survival and the occurrence of an event: application to responder versus non-responder bias. Statistics in Medicine 1984; 3: 35-44. //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot survival analysis with time dependent variables
I did.But could only find the citation-not an implementation. On Tuesday, January 1, 2013, David Winsemius wrote: On Dec 31, 2012, at 4:38 PM, moleps wrote: Dear all, Is there an implementation of Simon Makuch method of plotting the survival function with time-dependent variables. I´m only able to find event.chart in Hmisc for the purpose and I would prefer the Simon and Makuch method. I believe stata has it implemented for this purpose, but I cannot find it on CRAN. Simon R, Makuch RW. A non-parametric graphical representation of the relationship between survival and the occurrence of an event: application to responder versus non-responder bias. Statistics in Medicine 1984; 3: 35-44. Have you done any searching? http://markmail.org/search/?q=**list%3Aorg.r-project.r-help+** Simon+%26+Makuch#query:list%**3Aorg.r-project.r-help%** 20Simon%20%26%20Makuch+page:1+**mid:p7kssr6awkrwnnoo+state:**resultshttp://markmail.org/search/?q=list%3Aorg.r-project.r-help+Simon+%26+Makuch#query:list%3Aorg.r-project.r-help%20Simon%20%26%20Makuch+page:1+mid:p7kssr6awkrwnnoo+state:results -- David Winsemius, MD Alameda, CA, USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mChoice prb in rms
Dear all, I´m trying to get a output table for age and the summary of a and b, stratified by epo as follows using summary.formula h-data.frame(a=sample(c(A,NA),100,replace=T),b=sample(c(B,NA),100,replace=T),age=rnorm(100,50,25),epo=sample(c(Y,N),100,T)) library(rms) summary.formula(epo~age+mChoice(a,b,label=test),method=reverse,data=h,na.rm=TRUE,test=T) Descriptive Statistics by epo +-+---+---++ | |N |Y | Test | | |(N=51) |(N=49) |Statistic | +-+---+---++ |age | 43.8/53.5/74.6| 30.8/48.8/69.3|F=1.68 d.f.=1,98 P=0.198| +-+---+---++ |test : NA| 0% (0)| 0% (0)|| +-+---+---++ |A| 0% (0)| 0% (0)|| +-+---+---++ |B| 0% (0)| 0% (0)|| +-+---+---++ Digging deeper I find that summary(mChoice(h$a,h$b)) h$a 4 unique combinations Frequencies of Numbers of Choices Per Observation nchoices 1 2 30 70 Pairwise Frequencies (Diagonal Contains Marginal Frequencies) 0 x 0 matrix Frequencies of All Combinations NA A;B NA;A NA;B 30 24 23 23 Somehow this doesnt carry over into the summary.formula output... Also, running example(mChoice) reproduces this whereby the 0% categories are also shown. Any ideas? //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ctree and survival problem
sessionInfo yields the following: sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4tcltk splines grid stats graphics grDevices utils datasets methods base other attached packages: [1] ipred_0.8-8class_7.3-3mlbench_2.1-0 rpart_3.1-46 party_0.9-1vcd_1.2-9 colorspace_1.0-1 strucchange_1.4-3 coin_1.0-18mvtnorm_0.9-95 [11] modeltools_0.2-17 Design_2.3-0 car_2.0-8 nnet_7.3-1 sandwich_2.2-6 boot_1.2-43Amelia_1.2-18 lmtest_0.9-27 zoo_1.6-4 MASS_7.3-7 [21] mgcv_1.7-2 cem_1.0.142randomForest_4.6-2 lattice_0.19-17 nlme_3.1-97rgl_0.92.798 Hmisc_3.8-3survival_2.36-2 ggplot2_0.8.9 proto_0.3-8 [31] reshape_0.8.3 plyr_1.4 foreign_0.8-41 loaded via a namespace (and not attached): [1] cluster_1.13.2 digest_0.4.2 Matrix_0.999375-46 tools_2.11.1 As i described in the original post I ran example(ctree) and the following part yields an error: if (require(ipred)) { data(GBSG2, package = ipred) GBSG2ct - ctree(Surv(time, cens) ~ .,data = GBSG2) plot(GBSG2ct) treeresponse(GBSG2ct, newdata = GBSG2[1:2,]) } Error in Summary.Surv(c(1814, 2018, 712, 1807, 772, 448, 2172, 2161, 471, : Invalid operation on a survival time //M On 28. apr. 2011, at 14.19, Jonathan Daily wrote: It would help people who know more about R's guts than me if you posted your sessionInfo() output and exactly what commands produced your error. It is also recommended that you try simply upgrading R to the latest version and see if you get an error with the latest version of 'party'. My guess is that the error will go away. On Wed, Apr 27, 2011 at 3:40 PM, moleps mole...@gmail.com wrote: Forgot to mention that the ctree command is from the party library. //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- === Jon Daily Technician === #!/usr/bin/env outside # It's great, trust me. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ctree and survival problem
Yup, thats the culprit. Thx. //M On 28. apr. 2011, at 18.36, Achim Zeileis wrote: On Thu, 28 Apr 2011, moleps wrote: sessionInfo yields the following: OK, the Design package causes the problem here. When you load the Design package, it provides a new Surv() and related methods. This clashes with the computations of ctree() based on Surv(). So it's better not to load both packages simultaneously... Z sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4tcltk splines grid stats graphics grDevices utils datasets methods base other attached packages: [1] ipred_0.8-8class_7.3-3mlbench_2.1-0 rpart_3.1-46 party_0.9-1vcd_1.2-9 colorspace_1.0-1 strucchange_1.4-3 coin_1.0-18mvtnorm_0.9-95 [11] modeltools_0.2-17 Design_2.3-0 car_2.0-8 nnet_7.3-1 sandwich_2.2-6 boot_1.2-43Amelia_1.2-18 lmtest_0.9-27 zoo_1.6-4 MASS_7.3-7 [21] mgcv_1.7-2 cem_1.0.142randomForest_4.6-2 lattice_0.19-17nlme_3.1-97rgl_0.92.798 Hmisc_3.8-3 survival_2.36-2ggplot2_0.8.9 proto_0.3-8 [31] reshape_0.8.3 plyr_1.4 foreign_0.8-41 loaded via a namespace (and not attached): [1] cluster_1.13.2 digest_0.4.2 Matrix_0.999375-46 tools_2.11.1 As i described in the original post I ran example(ctree) and the following part yields an error: if (require(ipred)) { data(GBSG2, package = ipred) GBSG2ct - ctree(Surv(time, cens) ~ .,data = GBSG2) plot(GBSG2ct) treeresponse(GBSG2ct, newdata = GBSG2[1:2,]) } Error in Summary.Surv(c(1814, 2018, 712, 1807, 772, 448, 2172, 2161, 471, : Invalid operation on a survival time //M On 28. apr. 2011, at 14.19, Jonathan Daily wrote: It would help people who know more about R's guts than me if you posted your sessionInfo() output and exactly what commands produced your error. It is also recommended that you try simply upgrading R to the latest version and see if you get an error with the latest version of 'party'. My guess is that the error will go away. On Wed, Apr 27, 2011 at 3:40 PM, moleps mole...@gmail.com wrote: Forgot to mention that the ctree command is from the party library. //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- === Jon Daily Technician === #!/usr/bin/env outside # It's great, trust me. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ctree and survival problem
Dear all, I was intrigued by the ctree command and wanted to check it out. I first ran the demo with example(ctree) and did get the survival graphs in the end. Upon doing this with my own data and yielding a Invalid operation on a survival time I tried to rerun example(ctree) and now I also get Invalid operation on a survival time after the example runs plot(GBSG2ct)... any ideas what is going on here. I´m running R 2.11.1 //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ctree and survival problem
Forgot to mention that the ctree command is from the party library. //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MatchIt and sensitivity analysis
Dear all, Is there a package that allows me to run a sensitivy analysis on a matched dataset created using MatchIt? I am aware of both rbounds and the sensitivy function in the twang package but they do not allow matched objects from MatchIt as input. //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] p value for joint probability
My terminology is probably way off. I´ll try again in plain english. I´d like to generate a scatter plot of r1 r2 and color code each pair according to the probability of observing the pair given that the two samples (r1 r2) are drawn from two independent normal distributions. rr-data.frame(r1=-rnorm(1000,10,5),r2=-rnorm(1000,220,5)) with(rr,plot(r1,r2)) Best, //M On 31. jan. 2011, at 23.13, Peter Ehlers wrote: On 2011-01-31 12:42, moleps wrote: Dear all, Given rr-data.frame(r1-rnorm(1000,10,5),r2-rnorm(1000,220,5)) How can I add a column (rr$p) for the joint probability of each r1 r2 pair? If you take the values in each pair to be observations from two independent Normal distributions, it's easy: The joint probability of those values is zero. But I suspect you mean something else by joint probability. Can you elaborate? Peter Ehlers I know how to add the column.. I just dont know how to compute the p value for joint probabilities given the two samples. //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] p value for joint probability
Allright.. Appreciate the input on non-zero terminology (:-). What I wanted was: rr-data.frame(r1=rnorm(1000,10,5),r2=rnorm(1000,220,5)) with(rr,plot(r1,r2)) r3-kde2d(r1,r2,lims=c(2,18,200,240)) filled.contour(r3) //M On 1. feb. 2011, at 21.26, David Winsemius wrote: On Feb 1, 2011, at 2:31 PM, moleps wrote: My terminology is probably way off. I´ll try again in plain english. I´d like to generate a scatter plot of r1 r2 and color code each pair according to the probability of observing the pair given that the two samples (r1 r2) are drawn from two independent normal distributions. The answer is still zero. If you want to ask a different question that might have a non-zero answer, it might be: How can I color points on the basis of their joint density with an assumption of no correlation , you might get a better answer. Densities are not probabilities. You would need to specify whether the arguments to the rnorm functions (i.e. the theoretic values) were to be used or did you intend to use sample values for mean and sd? rr-data.frame(r1=-rnorm(1000,10,5),r2=-rnorm(1000,220,5)) with(rr,plot(r1,r2)) Best, //M On 31. jan. 2011, at 23.13, Peter Ehlers wrote: On 2011-01-31 12:42, moleps wrote: Dear all, Given rr-data.frame(r1-rnorm(1000,10,5),r2-rnorm(1000,220,5)) How can I add a column (rr$p) for the joint probability of each r1 r2 pair? If you take the values in each pair to be observations from two independent Normal distributions, it's easy: The joint probability of those values is zero. But I suspect you mean something else by joint probability. Can you elaborate? Peter Ehlers I know how to add the column.. I just dont know how to compute the p value for joint probabilities given the two samples. //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SetInternet2, RCurl and proxy
Dear all, Using the SetInternet2(TRUE) option works wonders with R in my sealed down work-environment. However, I'd like to use RCurl and apparently the proxy settings are not carried over. Is it possible to figure out the proxy-IP and port number from R after invoking SetInternet2? //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SetInternet2, RCurl and proxy
I know that the proxy usually can be found in internet explorer..However the hospital IE version has been altered so that nothing is visible. And the IT dept is not very keen on revealing the proxy settings. On Mon, Jan 31, 2011 at 2:08 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On Mon, 31 Jan 2011, moleps islon wrote: Dear all, Using the SetInternet2(TRUE) option works wonders with R in my sealed down work-environment. However, I'd like to use RCurl and apparently the proxy settings are not carried over. Is it possible to figure out the proxy-IP and port number from R after invoking SetInternet2? No, but it should be from your browser: all SetInternet2 does is to switch to use Internet Explorer internals. cURL (and hence RCurl) knows nothing about IE's settings. Note that if all you need is the proxy details, then you don't need SetInternet2: see ?download.file. However, many sites need to authenticate to the proxy -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] p value for joint probability
Dear all, Given rr-data.frame(r1-rnorm(1000,10,5),r2-rnorm(1000,220,5)) How can I add a column (rr$p) for the joint probability of each r1 r2 pair? I know how to add the column.. I just dont know how to compute the p value for joint probabilities given the two samples. //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hmisc, summary.formula and catTest
Dear all, I´m specifying the fisher.exact test for use with summary.formula as follows: u-function(a,b){ j-fisher.test(a) p-list(P=j$p.value,stat=NA,df=NA,testname=j$method,statname=) return(p) } However I´m also required to specify stat df. However this doesnt apply to the fisher test. I´ve tried specifying them as NA and without success-throws either a blank or an error msg trying to round a non-numeric value respectively. reproducible example: ex-pbc summary(trt~sex+ascites,data=ex,test=T,method=reverse) summary(trt~sex+ascites,data=ex,test=T,method=reverse,catTest=u) The closest I get is u-function(a,b){ j-fisher.test(a) p-list(P=j$p.value,stat=1,df=1,testname=j$method,statname=) return(p) } However then I manually have to edit the output. Is there a smart way of doing this? //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hmisc, summary.formula and catTest
Allright..Works like a charm. However I do believe that the prtest vector should have been mentioned in the catTest or conTest option. Appreciate your time and effort. Best, //M On 6. jan. 2011, at 23.24, Erik Iverson wrote: Does the prtest argument help when you actually use the 'print' function around your summary.formula object? I think that's how I solve it. I.e., sf1 - summary(trt~sex+ascites,data=ex,test=T,method=reverse,catTest=u) print(sf1, prtest = P) Descriptive Statistics by trt +---+---+-+-+---+ | |N |1|2|P-value| | | |(N=158) |(N=154) | | +---+---+-+-+---+ |sex : f|418|87% (137)|90% (139)| 0.377| +---+---+-+-+---+ |ascites|312| 9% ( 14)| 6% ( 10)| 0.526| +---+---+-+-+---+ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hmisc, summary.formula and catTest
Is it at all possible to specify this so that different tests display different parameters, ie have the continous test display F, df and p while tes categorical test display only P values? sf1 - summary(trt~sex+ascites+age,data=ex,test=T,method=reverse,catTest=u) print(sf1, prtest = P) //M On 6. jan. 2011, at 23.24, Erik Iverson wrote: Does the prtest argument help when you actually use the 'print' function around your summary.formula object? I think that's how I solve it. I.e., sf1 - summary(trt~sex+ascites,data=ex,test=T,method=reverse,catTest=u) print(sf1, prtest = P) Descriptive Statistics by trt +---+---+-+-+---+ | |N |1|2|P-value| | | |(N=158) |(N=154) | | +---+---+-+-+---+ |sex : f|418|87% (137)|90% (139)| 0.377| +---+---+-+-+---+ |ascites|312| 9% ( 14)| 6% ( 10)| 0.526| +---+---+-+-+---+ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confidence interval for logistic joinpoint regression from package ljr
I´m trying to run a logistic joinpoint regression utilising the ljr package. I´ve been using the forward selection technique to get the number of knots for the analysis, but I´m uncertain as to my results and the interpretation. The documentation is rather brief ( in the package and the stats in medicine article is quite technical) and without any good examples. At the moment I´m thinking 1)find the number of knots both using forward and backward techniques and see if they are close 2)present the annual percent change (APC) for each of the intervals, ie my present data (1950-2010 in 5 year intervals) is giving me Variables Coef b0 Intercept -131.20404630 g0 t0.06146463 g1 max(t-tau1,0) -0.51582466 g2 max(t-tau2,0)0.43429615 Joinpoints: 1 tau1= 1990.5 2 tau2= 1995.5 APC 1950-1990=exp(0.06)=1.06--6% 1990-1995=exp(0.06-0.51)=exp(-0.45)=0.63-- -37% 1995-2010=exp(0.06-0.51+0.43)---2% 3) Preferably a confidence interval for the APC should be given. However, this I havent figured out yet. //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple graphs
Problem solved.. My bad. No prb with cdplot or graphics-part. The problem was the a-list.. command which resulted in all three levels of bar$h.r in a[[1]]. Skipping the list function sorted it out. par(mfrow=c(2,2)) a-levels(bar$h.r)[c(1,3,6)] print(a) lapply(a,function(x){ a-subset(bar,h.r==x) with(a, cdplot(wh~Age,ylab=x)) #plot.new() }) Regards, //M On 8. sep. 2010, at 03.37, David Winsemius wrote: On Sep 7, 2010, at 8:02 PM, moleps wrote: Dear all, I´m trying to create multiple graphs on the same page, but they are all stacked on top of each other. My code: par(mfrow=c(2,2)) a-list(levels(bar$h.r)[c(1,3,6)]) print(a) lapply(a,function(x){ a-subset(bar,h.r==x) with(a, cdplot(wh~Age,ylab=x)) #plot.new() }) The plot.new command doesnt help... Any ideas?? ?layout # assuming that the undescribed plotting function is base graphics. Some plotting functions are hard coded and are able to defeat the usual formatting options. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple graphs
Dear all, I´m trying to create multiple graphs on the same page, but they are all stacked on top of each other. My code: par(mfrow=c(2,2)) a-list(levels(bar$h.r)[c(1,3,6)]) print(a) lapply(a,function(x){ a-subset(bar,h.r==x) with(a,cdplot(wh~Age,ylab=x)) #plot.new() }) The plot.new command doesnt help... Any ideas?? //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] summary in Hmisc and Latex
Dear all, With the latest update of Hmisc I no longer have any problems with latex. However using the ctable option produces latex code that at least on both the miktex distribution at work and mactex distribution at home refuses to run due to an extra blank line inserted between the multicolumn lines in the latex code... It runs fine if the line is deleted or if the ctable option is left out. Does this apply to other people as well? Regards, //M library(Hmisc) options(digits=3) set.seed(173) sex- factor(sample(c(m,f), 500, rep=TRUE)) age- rnorm(500, 50, 5) treatment- factor(sample(c(Drug,Placebo), 500, rep=TRUE)) symp- c('Headache','Stomach Ache','Hangnail', 'Muscle Ache','Depressed') symptom1- sample(symp, 500,TRUE) symptom2- sample(symp, 500,TRUE) symptom3- sample(symp, 500,TRUE) Symptoms- mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms') table (Symptoms) table(symptom1,symptom2) f- summary(treatment ~ age + sex + Symptoms, method=reverse, test=TRUE) latex(f,file=) latex(f,file=,ctable=T) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glm prb (Error in `contrasts-`(`*tmp*`, value = contr.treatment) : )
glm(A~B+C+D+E+F,family = binomial(link = logit),data=tre,na.action=na.omit) Error in `contrasts-`(`*tmp*`, value = contr.treatment) : contrasts can be applied only to factors with 2 or more levels however, glm(A~B+C+D+E,family = binomial(link = logit),data=tre,na.action=na.omit) runs fine glm(A~B+C+D+F,family = binomial(link = logit),data=tre,na.action=na.omit) runs fine glm(A~E+F,family = binomial(link = logit),data=tre,na.action=na.omit) Error in `contrasts-`(`*tmp*`, value = contr.treatment) : contrasts can be applied only to factors with 2 or more levels Why is this? Could it be due to collinearity between the two? Regards, //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] accessing the attr(*, label.table) after importing from spss
Thx... The following seems to work. However I´m sure there is a more elegant solution to it tre-dat b-length(dat) attr(dat,label.table)-a for (i in 1:b){ if(!is.null(a[[i]]) length(levels(as.factor(dat[,i])))==length(a[[i]])) { tre[,i]-factor(dat[,i],labels=names(a[[i]])) } } tre Regards, //M On 25. aug. 2010, at 23.05, Bert Gunter wrote: ?attr ?attributes -- Bert Gunter On Wed, Aug 25, 2010 at 1:26 PM, moleps mole...@gmail.com wrote: Dear all, I just received a file from a colleague in spss. The read.spss could not finish the file due to an error (Unrecognized record type 7, subtype 18 encountered in system file) so instead I converted the file using stat-transfer. Looking at my data I see that most labels are in the attributes and I´d love to access them and assign the pertinent variables to factors without doing the whole factor(levels,labels) thing manually. Is there any remedy to this ?? regards, M str(dat) 'data.frame': 860 obs. of 19 variables: $ Ag: int 15 15 15 15 15 15 15 15 15 15 ... $ G: int 2 2 2 1 1 1 1 1 1 2 ... $ GCQ: int 15 15 15 15 15 15 15 15 15 15 ... $ Amn: int 2 2 2 2 2 2 1 1 1 1 ... $ HI : int 1 1 1 1 1 1 2 2 2 2 ... $ Hos: int 2 2 2 2 2 2 2 2 2 2 ... $ Risk : int 2 2 2 2 2 2 2 2 2 2 ... $ CTO : int 2 2 2 2 2 1 1 1 1 1 ... $ pat : int NA NA NA NA NA 2 2 2 2 2 ... $ Day: int 7 7 7 5 4 6 5 7 7 5 ... $ Ho : int NA NA NA NA NA NA NA NA NA NA ... $ coh : int 1 1 1 1 1 1 1 1 1 1 ... $ Comp : int 1 1 1 1 1 1 1 1 1 1 ... $ Ethan: int 2 2 2 2 2 2 2 2 2 2 ... $ Pro : num NA NA NA NA NA NA NA NA NA NA ... $ Ye : int 1 1 1 3 3 3 1 1 1 3 ... $ Ageg: int 1 1 1 1 1 1 1 1 1 1 ... $ BAC: int 0 0 0 0 0 0 0 0 0 0 ... - attr(*, val.labels)= chr VL_Gender VL_Amnesia ... - attr(*, var.labels)= chr Age (years) Gender GCQSSA Amnesty ... - attr(*, label.table)=List of 19 ..$ : NULL ..$ : Named num 1 2 .. ..- attr(*, names)= chr Male Female ..$ : NULL ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : NULL ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : Named num 1 2 3 4 5 6 7 .. ..- attr(*, names)= chr Monday Tuesday Wednesday Thursday ... ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : Named num 1 2 3 4 5 6 7 .. ..- attr(*, names)= chr Yes Overtriage Undertriage with admission Overtriage with pos ... ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : NULL ..$ : NULL ..$ : Named num 1 2 3 4 .. ..- attr(*, names)= chr 15 - 24 25-39 40-59 60- ..$ : Named num 1 2 3 .. ..- attr(*, names)= chr 0.10-0.99 1.00-1.99 2.00- ..$ : Named num 0 1 2 3 4 .. ..- attr(*, names)= chr sorry undeweight 0.10-0.99 1.00-1.99 ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] accessing the attr(*,label.table) after importing from spss
Dear all, I just received a file from a colleague in spss. The read.spss could not finish the file due to an error (Unrecognized record type 7, subtype 18 encountered in system file) so instead I converted the file using stat-transfer. Looking at my data I see that most labels are in the attributes and I´d love to access them and assign the pertinent variables to factors without doing the whole factor(levels,labels) thing manually. Is there any remedy to this ?? regards, M str(dat) 'data.frame': 860 obs. of 19 variables: $ Ag: int 15 15 15 15 15 15 15 15 15 15 ... $ G: int 2 2 2 1 1 1 1 1 1 2 ... $ GCQ: int 15 15 15 15 15 15 15 15 15 15 ... $ Amn: int 2 2 2 2 2 2 1 1 1 1 ... $ HI : int 1 1 1 1 1 1 2 2 2 2 ... $ Hos: int 2 2 2 2 2 2 2 2 2 2 ... $ Risk : int 2 2 2 2 2 2 2 2 2 2 ... $ CTO : int 2 2 2 2 2 1 1 1 1 1 ... $ pat : int NA NA NA NA NA 2 2 2 2 2 ... $ Day: int 7 7 7 5 4 6 5 7 7 5 ... $ Ho : int NA NA NA NA NA NA NA NA NA NA ... $ coh : int 1 1 1 1 1 1 1 1 1 1 ... $ Comp : int 1 1 1 1 1 1 1 1 1 1 ... $ Ethan: int 2 2 2 2 2 2 2 2 2 2 ... $ Pro : num NA NA NA NA NA NA NA NA NA NA ... $ Ye : int 1 1 1 3 3 3 1 1 1 3 ... $ Ageg: int 1 1 1 1 1 1 1 1 1 1 ... $ BAC: int 0 0 0 0 0 0 0 0 0 0 ... - attr(*, val.labels)= chr VL_Gender VL_Amnesia ... - attr(*, var.labels)= chr Age (years) Gender GCQSSA Amnesty ... - attr(*, label.table)=List of 19 ..$ : NULL ..$ : Named num 1 2 .. ..- attr(*, names)= chr Male Female ..$ : NULL ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : NULL ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : Named num 1 2 3 4 5 6 7 .. ..- attr(*, names)= chr Monday Tuesday Wednesday Thursday ... ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : Named num 1 2 3 4 5 6 7 .. ..- attr(*, names)= chr Yes Overtriage Undertriage with admission Overtriage with pos ... ..$ : Named num 1 2 .. ..- attr(*, names)= chr Yes No ..$ : NULL ..$ : NULL ..$ : Named num 1 2 3 4 .. ..- attr(*, names)= chr 15 - 24 25-39 40-59 60- ..$ : Named num 1 2 3 .. ..- attr(*, names)= chr 0.10-0.99 1.00-1.99 2.00- ..$ : Named num 0 1 2 3 4 .. ..- attr(*, names)= chr sorry undeweight 0.10-0.99 1.00-1.99 ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple imputation and matching
Dear all, Is it possible to impute a dataset and create a summary table with summary from Hmisc and convert it to latex? I´m mostly familiar with cem and amelia hence the example from the documentation in cem. The imbalance command is not exactly what I was looking for... library (cem) if(require(Amelia)){ data(LL) n - dim(LL)[1] k - dim(LL)[2] set.seed(123) LL1 - LL idx - sample(1:n, .3*n) invisible(sapply(idx, function(x) LL1[x,sample(2:k,1)] - NA)) imputed - amelia(LL1,noms=c(black,hispanic,treated,married, nodegree,u74,u75)) imputed - imputed$imputations[1:5] ##Here I´d like to produce a table mat1 - cem(treated, datalist=imputed, drop=re78) mat1 ## here I´d like to produce a table with the matched elements. Regards, M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
Upon reading the plyr documentation that was the distinct impression I got and I´m glad that whatever expectations I had developed regarding plyr were fulfilled. Thx for the input Hadley. Maybe this is a cumbersome solution, but it works.. And Matthew, I will most definitively look into the datatable library. mydf - data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100)) mydf$fac-factor(sample((0:2),replace=T,100)) mydf$y- mydf$x1+0.01+mydf$x2*3-mydf$x3*19+rnorm(100) dlply(mydf,.(fac),function(df) lm(y~x1+x2+x3,data=df))-dl test-function(a){ coef(summary(a))-lo a-colnames(lo) b-rownames(lo) c-length(a) e-character(0) r-NULL for (x in (1:c)){ d-rep(paste(a[1:c],b[x],sep= )) e-paste(c(e,d)) t-lo[x,] r-c(r,t) names(r)-e } return(r) } ldply(dl,function(x) test(x))-g g Regards, Moleps On 9. aug. 2010, at 19.55, Hadley Wickham wrote: That's exactly what dlply does - so you should never have to do that yourself. I'm unclear what you are saying. Are you saying that the plyr function _should_ have examined the objects in that list and determined that there were 4 rows and properly labeled the rows to indicate which list they came from? Yes, exactly. It's the output from coef(summary(x)) that makes it look like this isn't happening. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
correction... Col and rows were mixed up and loop only worked when rows were less than or equal to number of columns //M test-function(a){ coef(summary(a))-lo a-colnames(lo) b-rownames(lo) c-length(a) e-character(0) r-NULL for (x in (1:length(b))){ d-rep(paste(a[1:c],b[x],sep= )) e-paste(c(e,d)) t-lo[x,] r-c(r,t) names(r)-e } return(r) } On 9. aug. 2010, at 19.55, Hadley Wickham wrote: That's exactly what dlply does - so you should never have to do that yourself. I'm unclear what you are saying. Are you saying that the plyr function _should_ have examined the objects in that list and determined that there were 4 rows and properly labeled the rows to indicate which list they came from? Yes, exactly. It's the output from coef(summary(x)) that makes it look like this isn't happening. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] coef(summary) and plyr
Dear all, I´m having trouble getting a list of regression variables back into a dataframe. mydf - data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100)) mydf$fac-factor(sample((0:2),replace=T,100)) mydf$y- mydf$x1+0.01+mydf$x2*3-mydf$x3*19+rnorm(100) dlply(mydf,.(fac),function(df) lm(y~x1+x2+x3,data=df))-dl here I´d like to use ldply(dl,coef(summary)) or something similar but I cant figure it out... Best, M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coef(summary) and plyr
ldply doesnt need a grouping variable as far as I understand the command.. Description For each element of a list, apply function then combine results into a data frame Usage ldply(.data, .fun = NULL, ..., .progress = none) regards, M On 9. aug. 2010, at 15.33, David Winsemius wrote: On Aug 9, 2010, at 7:51 AM, moleps wrote: Dear all, I´m having trouble getting a list of regression variables back into a dataframe. mydf - data.frame(x1=rnorm(100), x2=rnorm(100), x3=rnorm(100)) mydf$fac-factor(sample((0:2),replace=T,100)) mydf$y- mydf$x1+0.01+mydf$x2*3-mydf$x3*19+rnorm(100) dlply(mydf,.(fac),function(df) lm(y~x1+x2+x3,data=df))-dl here I´d like to use ldply(dl,coef(summary)) or something similar but I cant figure it out... dfdl - ldply(dl, function(x) coef(summary(x)) ) Doesn't create a grouping variable, so: dfdl$group=rep(0:2, each=4) David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Remove squares from scatter3D
Dear All, I´ve been trying to find an option to scatter3D from rcmdr to remove the individual points from the plots but to no help so far. Removing the residuals is easy, but I cannot find a similar point option. Is there such an option that can be set to FALSE? Best, //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Latex problem in Hmisc (3.8-1) and Mac Os X with R 2.11.1
Dear all, I did post this more or less identical mail in a follow up to another question I posted, but under another heading. I try again, but now under the correct header. upon running this code (from the Hmisc library-latex function) I believe the call to summary.formula is allright and produces wonderful tables, but the latex command results in a correct formatted table but where all the numbers and the test columns are wrong. I've pasted in both the R code and the resulting latex code annotated with comments from the run. Does the same code produce correct cell-entries in other installation ? //M library(Hmisc) options(digits=3) set.seed(173) sex - factor(sample(c(m,f), 500, rep=TRUE)) age - rnorm(500, 50, 5) treatment - factor(sample(c(Drug,Placebo), 500, rep=TRUE)) symp - c('Headache','Stomach Ache','Hangnail', 'Muscle Ache','Depressed') symptom1 - sample(symp, 500,TRUE) symptom2 - sample(symp, 500,TRUE) symptom3 - sample(symp, 500,TRUE) Symptoms - mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms') table (Symptoms) table(symptom1,symptom2) f - summary(treatment ~ age + sex + Symptoms, method=reverse, test=TRUE) g - summary(treatment ~ age + sex + symptom1, method=reverse, test=TRUE) latex(g) latex(g,file=) % latex.default(cstats, title = title, caption = caption, rowlabel = rowlabel, col.just = col.just, numeric.dollar = FALSE, insert.bottom = legend, rowname = lab, dcolumn = dcolumn, extracolheads = extracolheads, extracolsize = Nsize, ...) % \begin{table}[!tbp] \caption{Descriptive Statistics by treatment\label{g}} \begin{center} \begin{tabular}{lccc}\hline\hline \multicolumn{1}{l}{}\multicolumn{1}{c}{Drug}\multicolumn{1}{c}{Placebo}\multicolumn{1}{c}{Test Statistic}\tabularnewline \multicolumn{1}{c}{{\scriptsize $N=263$}}\multicolumn{1}{c}{{\scriptsize $N=237$}}\tabularnewline \hline age114\tabularnewline sex~:~m672\tabularnewline symptom1~:~Depressed433\tabularnewline Hangnail561\tabularnewline Headache421\tabularnewline Muscle~Ache351\tabularnewline Stomach~Ache241\tabularnewline \hline \end{tabular} \end{center} \noindent {\scriptsize $a$\ }{$b$\ }{\scriptsize $c$\ } represent the lower quartile $a$, the median $b$, and the upper quartile $c$\ for continuous variables.\\Numbers after percents are frequencies.\\\indent Tests used:\\\textsuperscript{\normalfont 1}Wilcoxon test; \textsuperscript{\normalfont 2}Pearson test \end{table} ###Then I did another example from Harrell´s statistical tables and plots rm(list=ls()) library(Hmisc) getHdata(prostate) # Variables in prostate had units in ( ) inside variable labels. Move # these units of measurements to separate units attributes # wt is an exception. It has ( ) in its label but this does not denote units # Also make hg have a legal R plotmath expression prostate-upData(prostate, moveUnits=TRUE,units=c(wt=, hg=g/100*ml),labels=c(wt=Weight Index = wt(kg)-ht(cm)+200)) attach(prostate) stage- factor(stage, 3:4, c(Stage 3,Stage 4)) s6-summary(stage~rx+age+wt+pf+hx+sbp+dbp+ekg+hg+sz+sg+ap+bm,method=reverse, overall=TRUE, test=TRUE) options(digits=2) w-latex(s6, size=smaller[3], outer.size=smaller, Nsize=smaller,long=TRUE, prmsd=TRUE, msdsize=smaller,middle.bold=TRUE, ctable=TRUE) ##This refused to run ( as long as the ctable=T was included), but without it latex (s6) ##I do get a nicely formated table, but again the numbers are all wrong... Also ##latex(s6, long=TRUE, prmsd=TRUE, msdsize=smaller,middle.bold=TRUE) ##makes no difference from latex(s6) alone with regards to formatting... Quite frustrating-Any suggestions?? //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Latex and r
Dear R´ers I´m trying to get a summary table using latex and summary in the rms package to no avail. I´m running R 2.10.1, Mac OS X snow leopard and I have the mactex 2009 distribution installed. Any obvious things I´m missing? //M options(digits=3) set.seed(173) sex - factor(sample(c(m,f), 500, rep=TRUE)) age - rnorm(500, 50, 5) treatment - factor(sample(c(Drug,Placebo), 500, rep=TRUE)) f - summary(treatment ~ age + sex + Symptoms, method=reverse, test=TRUE) latex(f) results in the following: This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009) entering extended mode (/var/folders/q9/q9COp2FREsikCyHB7w+OxE+++TI/-Tmp-//RtmpVIk0iB/file587f83cb.tex LaTeX2e 2009/09/24 Babel v3.8l and hyphenation patterns for english, usenglishmax, dumylang, noh yphenation, german-x-2009-06-19, ngerman-x-2009-06-19, ancientgreek, ibycus, ar abic, basque, bulgarian, catalan, pinyin, coptic, croatian, czech, danish, dutc h, esperanto, estonian, farsi, finnish, french, galician, german, ngerman, mono greek, greek, hungarian, icelandic, indonesian, interlingua, irish, italian, ku rmanji, latin, latvian, lithuanian, mongolian, mongolian2a, bokmal, nynorsk, po lish, portuguese, romanian, russian, sanskrit, serbian, slovak, slovenian, span ish, swedish, turkish, ukenglish, ukrainian, uppersorbian, welsh, loaded. (/usr/local/texlive/2009/texmf-dist/tex/latex/base/report.cls Document Class: report 2007/10/19 v1.4h Standard LaTeX document class (/usr/local/texlive/2009/texmf-dist/tex/latex/base/size10.clo)) (/usr/local/texlive/2009/texmf-dist/tex/latex/geometry/geometry.sty (/usr/local/texlive/2009/texmf-dist/tex/latex/graphics/keyval.sty) (/usr/local/texlive/2009/texmf-dist/tex/generic/oberdiek/ifpdf.sty) (/usr/local/texlive/2009/texmf-dist/tex/generic/oberdiek/ifvtex.sty) (/usr/local/texlive/2009/texmf-dist/tex/xelatex/xetexconfig/geometry.cfg)) No file file587f83cb.aux. *geometry auto-detecting driver* *geometry detected driver: dvips* Overfull \hbox (1.14412pt too wide) in paragraph at lines 9--23 [] [1] (./file587f83cb.aux) LaTeX Warning: Label(s) may have changed. Rerun to get cross-references right. ) (see the transcript file for additional information) Output written on file587f83cb.dvi (1 page, 1620 bytes). Transcript written on file587f83cb.log. sh: xdvi: command not found __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Latex and r
xdvi is installed in the same location as yours. I even did a reinstallment of mactex. Still doesnt work. But since I´m now convinced its related to my latex distribution I´ll take the problem elsewhere.. Regards, //M On 16. juni 2010, at 17.19, Prof Brian Ripley wrote: On Wed, 16 Jun 2010, Erik Iverson wrote: moleps wrote: Dear R´ers I´m trying to get a summary table using latex and summary in the rms package to no avail. I´m running R 2.10.1, Mac OS X snow leopard and I have the mactex 2009 distribution installed. Any obvious things I´m missing? file587f83cb.log. sh: xdvi: command not found You apparently don't have xdvi installed, which is used to view the resulting xdvi is part of MacTeX 2009. So the latter may be installed, but it is not in the path or incomplete or I have tystie% which xdvi /usr/texbin/xdvi document. How you get that installed on your OS, I don't know. Depending on what you're doing, you might want to use the file argument of latex function to output a latex file, and then do further processing. Also, assigning the latex function call to a variable, x - latex(...) will suppress printing of the object, which is ultimately what is trying to use xdvi. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Latex and r (using summary from RMS)
Dear all, After spending all day and most of the night on this I did a new R-installation and it works. The question now is - upon running this code (from the Hmisc library-latex function) I believe the call to summary.formula is allright, but the latex command results in a totally different table where all the numbers and test columns are wrong are wrong. Is this still a matter for the installation or is there something in the latex syntax I havent grasped? //M library(Hmisc) options(digits=3) set.seed(173) sex - factor(sample(c(m,f), 500, rep=TRUE)) age - rnorm(500, 50, 5) treatment - factor(sample(c(Drug,Placebo), 500, rep=TRUE)) symp - c('Headache','Stomach Ache','Hangnail', 'Muscle Ache','Depressed') symptom1 - sample(symp, 500,TRUE) symptom2 - sample(symp, 500,TRUE) symptom3 - sample(symp, 500,TRUE) Symptoms - mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms') table (Symptoms) table(symptom1,symptom2) f - summary(treatment ~ age + sex + Symptoms, method=reverse, test=TRUE) g - summary(treatment ~ age + sex + symptom1, method=reverse, test=TRUE) latex(g) Produces nice tables-but the numbers are all wrong as you can see below from the latex file... latex(g,file=) % latex.default(cstats, title = title, caption = caption, rowlabel = rowlabel, col.just = col.just, numeric.dollar = FALSE, insert.bottom = legend, rowname = lab, dcolumn = dcolumn, extracolheads = extracolheads, extracolsize = Nsize, ...) % \begin{table}[!tbp] \caption{Descriptive Statistics by treatment\label{g}} \begin{center} \begin{tabular}{lccc}\hline\hline \multicolumn{1}{l}{}\multicolumn{1}{c}{Drug}\multicolumn{1}{c}{Placebo}\multicolumn{1}{c}{Test Statistic}\tabularnewline \multicolumn{1}{c}{{\scriptsize $N=263$}}\multicolumn{1}{c}{{\scriptsize $N=237$}}\tabularnewline \hline age114\tabularnewline sex~:~m672\tabularnewline symptom1~:~Depressed433\tabularnewline Hangnail561\tabularnewline Headache421\tabularnewline Muscle~Ache351\tabularnewline Stomach~Ache241\tabularnewline \hline \end{tabular} \end{center} \noindent {\scriptsize $a$\ }{$b$\ }{\scriptsize $c$\ } represent the lower quartile $a$, the median $b$, and the upper quartile $c$\ for continuous variables.\\Numbers after percents are frequencies.\\\indent Tests used:\\\textsuperscript{\normalfont 1}Wilcoxon test; \textsuperscript{\normalfont 2}Pearson test \end{table} Then I did another example from Harrell´s statistical tables and plots rm(list=ls()) library(Hmisc) getHdata(prostate) # Variables in prostate had units in ( ) inside variable labels. Move # these units of measurements to separate units attributes # wt is an exception. It has ( ) in its label but this does not denote units # Also make hg have a legal R plotmath expression prostate-upData(prostate, moveUnits=TRUE,units=c(wt=, hg=g/100*ml),labels=c(wt=Weight Index = wt(kg)-ht(cm)+200)) attach(prostate) stage- factor(stage, 3:4, c(Stage 3,Stage 4)) s6-summary(stage~rx+age+wt+pf+hx+sbp+dbp+ekg+hg+sz+sg+ap+bm,method=reverse, overall=TRUE, test=TRUE) options(digits=2) w-latex(s6, size=smaller[3], outer.size=smaller, Nsize=smaller,long=TRUE, prmsd=TRUE, msdsize=smaller,middle.bold=TRUE, ctable=TRUE) This refused to run ( as long as the ctable=T was included), but without it latex (s6) I do get a nicely formated table, but again the numbers are all wrong... Also latex(s6, long=TRUE, prmsd=TRUE, msdsize=smaller,middle.bold=TRUE) makes no difference from latex(s6) alone with regards to formatting... Quite frustrating-Any suggestions?? //M On 16. juni 2010, at 20.10, Kevin E. Thorpe wrote: moleps wrote: Dear R´ers I´m trying to get a summary table using latex and summary in the rms package to no avail. I´m running R 2.10.1, Mac OS X snow leopard and I have the mactex 2009 distribution installed. Any obvious things I´m missing? //M options(digits=3) set.seed(173) sex - factor(sample(c(m,f), 500, rep=TRUE)) age - rnorm(500, 50, 5) treatment - factor(sample(c(Drug,Placebo), 500, rep=TRUE)) f - summary(treatment ~ age + sex + Symptoms, method=reverse, test=TRUE) latex(f) results in the following: This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009) entering extended mode (/var/folders/q9/q9COp2FREsikCyHB7w+OxE+++TI/-Tmp-//RtmpVIk0iB/file587f83cb.tex LaTeX2e 2009/09/24 Babel v3.8l and hyphenation patterns for english, usenglishmax, dumylang, noh yphenation, german-x-2009-06-19, ngerman-x-2009-06-19, ancientgreek, ibycus, ar abic, basque, bulgarian, catalan, pinyin, coptic, croatian, czech, danish, dutc h, esperanto, estonian, farsi, finnish, french, galician, german, ngerman, mono greek, greek, hungarian, icelandic, indonesian, interlingua, irish, italian, ku rmanji, latin, latvian, lithuanian, mongolian, mongolian2a, bokmal, nynorsk, po lish, portuguese, romanian, russian, sanskrit, serbian, slovak, slovenian, span ish, swedish, turkish, ukenglish, ukrainian
[R] marginal structural models
Dear listers, Does anyone have any experience running marginal structural models in r or can point me in the direction of any good tutorials on this? Regards, //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] paste together a string object later to be utilized in a function
I'm sorry for not clearer describing my motive with this-- So this is what I'm trying to do- Take a survival object and utilize it in ggplot. ggkm-function(time,event,stratum) { m2s-Surv(time,as.numeric(event)) fit - survfit(m2s ~ stratum) f$time-fit$time f$surv-fit$surv f$strata-c(rep(names(fit$strata[1]),fit$strata[1]),rep(names(fit$strata[2]),fit$strata[2])) f$upper-fit$upper f$lower-fit$lower r-ggplot (f,aes(x=time,y=surv,fill=strata,group=strata))+geom_line()+geom_ribbon(aes(ymin=lower,ymax=upper),alpha=0.3) return(r) } My problem is that I can have more than two strata, and I would like the function to automatically detect the number of strata. Hence a quick hack as to how I can create f$strata when I dont know the number of strata would be appreciated. The paste(rep(names(fit$strata[,1:length(fit$strata),]),fit$strata[,1:length(fit$strata),]),sep=) is as close as I get. However this results in multiple strings and I havent discovered yet how I can pass this so f$strata is created. This could easily be done in stata appending to a macro, but as I am a recent convertee I dont know how to do this in R. (yet...?) Regards, //M On Mon, Jun 7, 2010 at 2:50 AM, Joris Meys jorism...@gmail.com wrote: Wild guess, but it looks like you are looking at : ts - list(a=1:5) names(ts$a) - letters[1:5] v-paste(rep(names(ts$a[,1:b,]),ts$a[,1:b,]),sep=) sapply(v,function(x){eval(parse(,text=x))}) $`rep(names(ts$a[1]),ts$a[1])` [1] a $`rep(names(ts$a[2]),ts$a[2])` [1] b b $`rep(names(ts$a[3]),ts$a[3])` [1] c c c $`rep(names(ts$a[4]),ts$a[4])` [1] d d d d $`rep(names(ts$a[5]),ts$a[5])` [1] e e e e e assign(test,eval(parse(text=v[3]))) test [1] c c c Cheers Joris On Sun, Jun 6, 2010 at 9:51 PM, moleps mole...@gmail.com wrote: Dear r-listers, I need to pass a string to a function. However the length of the string is dependent on the length of a vector. b-length(h) v-paste(rep(names(ts$a[,1:b,]),ts$a[,1:b,]),sep=) Is it possible somehow to pass this as an argument to a function later on ? Regards, //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] paste together a string object later to be utilized in a function
Sorry for bothering all of you. In the end it turned out to be much simpler than I thought. Takes a while to get used to the vectorizing idea. require(survival) require(ggplot2) ggkm-function(time,event,stratum) { stratum-as.factor(stratum) m2s-Surv(time,as.numeric(event)) fit - survfit(m2s ~ stratum) w-fit$time k-fit$surv o-length(levels(stratum)) strata-c(rep(names(fit$strata[1:o]),fit$strata[1:o])) upper-fit$upper lower-fit$lower f-data.frame(w,k,strata,upper,lower) r-ggplot (f,aes(x=w,y=k,fill=strata,group=strata))+geom_line(aes(color=strata))+geom_ribbon(aes(ymin=lower,ymax=upper),alpha=0.3)+xlim(0,fit$maxtime)+ylim(0,1) r-r+scale_fill_brewer(f$strata,palette=Set1)+scale_color_brewer(f$strata,palette=Set1) return(r) } data(lung) with(lung,ggkm(time,status,sex)) with(lung,ggkm(time,status,pat.karno)) //M On 7. juni 2010, at 22.31, Greg Snow wrote: Does the collapse argument to the paste function do what you want? Possibly nested inside another paste. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of moleps Sent: Sunday, June 06, 2010 1:51 PM To: r-help@r-project.org Subject: [R] paste together a string object later to be utilized in a function Dear r-listers, I need to pass a string to a function. However the length of the string is dependent on the length of a vector. b-length(h) v-paste(rep(names(ts$a[,1:b,]),ts$a[,1:b,]),sep=) Is it possible somehow to pass this as an argument to a function later on ? Regards, //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] paste together a string object later to be utilized in a function
Dear r-listers, I need to pass a string to a function. However the length of the string is dependent on the length of a vector. b-length(h) v-paste(rep(names(ts$a[,1:b,]),ts$a[,1:b,]),sep=) Is it possible somehow to pass this as an argument to a function later on ? Regards, //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] apply min function rowwise
I´m trying to tease out the minimum value from a row in a dataframe where all the variables are dates. apply(canc[,vec],1,function(x)min(x,na.rm=T)) However it only returns empty strings for the entire dataframe except for one date value (which is not the minimum date). I´ve also tried apply(canc[,vec],1,function(x)max(x,na.rm=T)) which provides values rowwise, but many of them are not in fact the largest in the row. Any advice? Regards, //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply min function rowwise
Appreciate it... //M On 5. juni 2010, at 20.11, Joshua Wiley wrote: On Sat, Jun 5, 2010 at 10:22 AM, moleps mole...@gmail.com wrote: thx. It was only the first instance that was class date. The rest were factors. So that explains it. If I want to change the rest in vec into class date (there are many of them...) neither as.Date(canc[,vec],%d.%m.%Y) nor sapply(canc[,vec],FUN=function(x) as.date(x,%d.%m.%Y)) works What is the easy solution to this? This is the nicest solution that comes to mind: as.data.frame(lapply(X=samp.dat, FUN=as.Date, format=%d.%m.%Y)) I believe the problem is that sapply() coerces the results (by default when simplify=TRUE) using as.vector() leaving you with the number of days since the origin. Anyway, using as.data.frame() on the list output from lapply() seems to work. Josh Regards, //M On 5. juni 2010, at 18.30, Joshua Wiley wrote: Hello M, My guess is that it has something to do with the class of the variables. Perhaps you could provide a small sample dataframe? Also you might try running str() on your data frame and seeing if the results are what you would expect. As a side note, it is not necessary to make an anonymous function here, as you are allowed to pass arguments to the function applied. apply(canc[,vec],1, min, na.rm=TRUE) Best regards, Josh On Sat, Jun 5, 2010 at 8:30 AM, moleps mole...@gmail.com wrote: I´m trying to tease out the minimum value from a row in a dataframe where all the variables are dates. apply(canc[,vec],1,function(x)min(x,na.rm=T)) However it only returns empty strings for the entire dataframe except for one date value (which is not the minimum date). I´ve also tried apply(canc[,vec],1,function(x)max(x,na.rm=T)) which provides values rowwise, but many of them are not in fact the largest in the row. Any advice? Regards, //M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Senior in Psychology University of California, Riverside http://www.joshuawiley.com/ -- Joshua Wiley Senior in Psychology University of California, Riverside http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data-management: Rowwise NA
Dear R´ers.. In this mock dataset how can I generate a logical variable based on whether just tes or tes3 are NA in each row?? test-sample(c(A,NA,B),100,replace=T) test2-sample(c(A,NA,B),100,replace=T) test3-sample(c(A,NA,B),100,replace=T) tes-cbind(test,test2,test3) sam-c(test,test3) apply(subset(tes,select=sam),1,FUN=function(x) is.na(x)) However this just tests whether each variable is missing or not per row. I´d like an -or- function in here that would provide one true/false per row based on whether test or tes3 are NA. I guess it would be easy to do it by subsetting in the example but I figure there is a more elegant way of doing it when -sam- contains 50 variables... //M [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data-management: Rowwise NA
-Any- was my fix... Appreciate it. //M On 3. juni 2010, at 21.33, Phil Spector wrote: ?any Not really a reproducible answer, but I think you're looking for apply(tes[,sam],1,function(x)any(is.na(x))) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Thu, 3 Jun 2010, moleps wrote: Dear R?ers.. In this mock dataset how can I generate a logical variable based on whether just tes or tes3 are NA in each row?? test-sample(c(A,NA,B),100,replace=T) test2-sample(c(A,NA,B),100,replace=T) test3-sample(c(A,NA,B),100,replace=T) tes-cbind(test,test2,test3) sam-c(test,test3) apply(subset(tes,select=sam),1,FUN=function(x) is.na(x)) However this just tests whether each variable is missing or not per row. I?d like an -or- function in here that would provide one true/false per row based on whether test or tes3 are NA. I guess it would be easy to do it by subsetting in the example but I figure there is a more elegant way of doing it when -sam- contains 50 variables... //M [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] two columns into one
Dear R´ers, How can I create one single factor variable from two variables incorporating all possible combinations of the values?? test-sample(c(A,NA,B),100,replace=T) test2-sample(c(E,F,A),100,replace=T) tes-cbind(test,test2) pseduocode: r-function(test,test2) r AE AF AA NAE NAF NAA BE BF BA //M [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation problem
In the end after going at it from scratch...This worked out allright... ##set up data age.cat-seq(0,100,10) year-(1953:(1953+55)) dat.vec-sample(1:10,(length(age.cat)*length(year))) dat.matrix-matrix(dat.vec,c(length(age.cat),length(year))) rownames(dat.matrix)-age.cat colnames(dat.matrix)-year year.int-seq(1950,2010,5) age.div-cut(year,year.int,include.lowest=T) ##summarise by another variable a-do.call(cbind,by(t(dat.matrix),age.div,function(x)colSums(x)));a //M On 6. apr. 2010, at 21.41, David Winsemius wrote: On Apr 6, 2010, at 3:30 PM, David Winsemius wrote: On Apr 6, 2010, at 9:56 AM, moleps islon wrote: OK... next question.. Which is still a data manipulation problem so I believe the heading is still OK. ##So now I read my population data from excel. No, you read it from a text file and providing the first ten lines of that text file should have been really easy. Read the Posting Guide for advice about offering datasets either as structure() objects with dput or dump or as attached files with *.txt extension (not .csv). Just change the file name with your file browser. pop-read.csv(pop.csv) typeof(pop) ## yields a list Really? I would have guessed it to yield just list. where I have age-specific population rows and a yearly column population, where the years are suffixed by X And had you used class(pop) you would have learned it was a dataframe and even more informative would have been str(pop). c-(1953:2008) No, no, no. Do not use variable names that are important function names. The R interpreter can (usually) keep things straight but it is our brains that experience problems. Other function names to avoid: data, df, cut, mean, sd, list, vector, matrix names(pop)-c c.div-cut(c,break=seq(1950,2010,by=5) (You should have gotten an error here.) After fixing the error, did you you notice that there were only 3 of the first level??? Watch out for cut(). It uses the default convention of ( , ] , i.e. open interval at right er, ^left^ which is backwards to what some (most?) of us think natural. Because of that the lowest level gets dropped unless you take special precautions. That is undoubtedly why Harrell set up his Hmisc::cut2 to have the default be [ , ) Aggregating across columns? Certainly possible, but maybe not as natural a fit to functions like split as would occur with working across rows. I suppose you could use something like this untested (because _still_ no sample dataset provided) code: apply(pop, 1,# this works a row a time function(x) tapply(x, list(c.div), sum) ) ) # or use aggregate which uses tapply I'm not sure it will work, since I don't know if the column names would get carried over into x by apply(). You might need to create a separate index that used the numeric positions of the columns rather than their names. Perhaps use c.div - seq(0,(2008-1953)) %/% 5 or some such inside tapply. Now I'd like to sum the agespecific population over the individual levels of -c.div- and generate a new table for this with agespecific rows and columns containing the 5-year bins instead of the original yearly data. Do I have to program this from scratch or is it possible to use an already existing function? I think you ought to read more introductory material (and the Posting Guide regarding how to offer example datasets). In this case there are many functions that do data aggregation and most of them should be illustrated in a good introductory text. -- David. //M qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE On Mon, Apr 5, 2010 at 10:11 PM, moleps mole...@gmail.com wrote: Thx Erik, I have no idea what went wrong with the other code snippet, but this one works.. Appreciate it. qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE)) M On 5. apr. 2010, at 21.45, Erik Iverson wrote: I don't know what your data are like, since you haven't given a reproducible example. I was imagining something like: ## generate fake data age - sample(20:90, 100, replace = TRUE) year - sample(1950:2000, 100, replace = TRUE) ##look at big table table(age, year) ## categorize data ## see include.lowest and right arguments to cut age.factor - cut(age, breaks = seq(20, 90, by = 10), include.lowest = TRUE) year.factor - cut(year, breaks = seq(1950, 2000, by = 10), include.lowest = TRUE) table(age.factor, year.factor) moleps wrote: I already did try the regression modeling approach. However the epidemiologists (referee) turns out to be quite fond of comparing the incidence rates to different standard populations, hence the need
Re: [R] Data manipulation problem
So.. here we try again. ##generate dataset age.cat-seq(0,100,10) year-(1953:(1953+55)) data.vec-sample(1:1,(age.cat*year)) data.matrix-matrix(data.vec,c(length(age.cat),length(year)) rownames(data.matrix)-age.cat colnames(data.matrix)-year ##divide into 5 year periods age.div-cut(year,seq(1950,2010,6),include.lowest=T) ##interval is beyond my datainterval so I doubt the include.lowest matters Now what I'd like to do is summarise the rows within the 5-year intervals. I did read about apply in its different variants and Dahlgaard, but I do not know understand how it could be applied in this setting. I tried making an array and summarise by that (used the vector and applied it into a length(age.cat)*max(vector(table(age.div)*length(age.div) array. It worked but required a bit of tweaking (inserting null columns) and I find myself in this situation quite often whereby I need to add multiple columns based on another vector so I'd be very interested in another more general approach. //M On Tue, Apr 6, 2010 at 9:41 PM, David Winsemius dwinsem...@comcast.net wrote: On Apr 6, 2010, at 3:30 PM, David Winsemius wrote: On Apr 6, 2010, at 9:56 AM, moleps islon wrote: OK... next question.. Which is still a data manipulation problem so I believe the heading is still OK. ##So now I read my population data from excel. No, you read it from a text file and providing the first ten lines of that text file should have been really easy. Read the Posting Guide for advice about offering datasets either as structure() objects with dput or dump or as attached files with *.txt extension (not .csv). Just change the file name with your file browser. pop-read.csv(pop.csv) typeof(pop) ## yields a list Really? I would have guessed it to yield just list. where I have age-specific population rows and a yearly column population, where the years are suffixed by X And had you used class(pop) you would have learned it was a dataframe and even more informative would have been str(pop). c-(1953:2008) No, no, no. Do not use variable names that are important function names. The R interpreter can (usually) keep things straight but it is our brains that experience problems. Other function names to avoid: data, df, cut, mean, sd, list, vector, matrix names(pop)-c c.div-cut(c,break=seq(1950,2010,by=5) (You should have gotten an error here.) After fixing the error, did you you notice that there were only 3 of the first level??? Watch out for cut(). It uses the default convention of ( , ] , i.e. open interval at right er, ^left^ which is backwards to what some (most?) of us think natural. Because of that the lowest level gets dropped unless you take special precautions. That is undoubtedly why Harrell set up his Hmisc::cut2 to have the default be [ , ) Aggregating across columns? Certainly possible, but maybe not as natural a fit to functions like split as would occur with working across rows. I suppose you could use something like this untested (because _still_ no sample dataset provided) code: apply(pop, 1,# this works a row a time function(x) tapply(x, list(c.div), sum) ) ) # or use aggregate which uses tapply I'm not sure it will work, since I don't know if the column names would get carried over into x by apply(). You might need to create a separate index that used the numeric positions of the columns rather than their names. Perhaps use c.div - seq(0,(2008-1953)) %/% 5 or some such inside tapply. Now I'd like to sum the agespecific population over the individual levels of -c.div- and generate a new table for this with agespecific rows and columns containing the 5-year bins instead of the original yearly data. Do I have to program this from scratch or is it possible to use an already existing function? I think you ought to read more introductory material (and the Posting Guide regarding how to offer example datasets). In this case there are many functions that do data aggregation and most of them should be illustrated in a good introductory text. -- David. //M qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE On Mon, Apr 5, 2010 at 10:11 PM, moleps mole...@gmail.com wrote: Thx Erik, I have no idea what went wrong with the other code snippet, but this one works.. Appreciate it. qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE)) M On 5. apr. 2010, at 21.45, Erik Iverson wrote: I don't know what your data are like, since you haven't given a reproducible example. I was imagining something like: ## generate fake data age - sample(20:90, 100, replace = TRUE) year - sample(1950:2000, 100, replace = TRUE) ##look at big table table(age, year) ## categorize data ## see include.lowest
Re: [R] Data manipulation problem
OK... next question.. Which is still a data manipulation problem so I believe the heading is still OK. ##So now I read my population data from excel. pop-read.csv(pop.csv) typeof(pop) ## yields a list where I have age-specific population rows and a yearly column population, where the years are suffixed by X c-(1953:2008) names(pop)-c c.div-cut(c,break=seq(1950,2010,by=5) Now I'd like to sum the agespecific population over the individual levels of -c.div- and generate a new table for this with agespecific rows and columns containing the 5-year bins instead of the original yearly data. Do I have to program this from scratch or is it possible to use an already existing function? //M qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE On Mon, Apr 5, 2010 at 10:11 PM, moleps mole...@gmail.com wrote: Thx Erik, I have no idea what went wrong with the other code snippet, but this one works.. Appreciate it. qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE)) M On 5. apr. 2010, at 21.45, Erik Iverson wrote: I don't know what your data are like, since you haven't given a reproducible example. I was imagining something like: ## generate fake data age - sample(20:90, 100, replace = TRUE) year - sample(1950:2000, 100, replace = TRUE) ##look at big table table(age, year) ## categorize data ## see include.lowest and right arguments to cut age.factor - cut(age, breaks = seq(20, 90, by = 10), include.lowest = TRUE) year.factor - cut(year, breaks = seq(1950, 2000, by = 10), include.lowest = TRUE) table(age.factor, year.factor) moleps wrote: I already did try the regression modeling approach. However the epidemiologists (referee) turns out to be quite fond of comparing the incidence rates to different standard populations, hence the need for this labourius approach. And trying the cutting approach I ended up with : table (age5) age5 (0,5] (5,10] (10,15] (15,20] (20,25] (25,30] (30,35] (35,40] (40,45] (45,50] (50,55] (55,60] (60,65] (65,70] (70,75] (75,80] (80,85] (85,100] 35 34 33 47 51 109 157 231 362 511 745 926 1002 866 547 247 82 18 table (yr5) yr5 (1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975] (1975,1980] (1980,1985] (1985,1990] (1990,1995] (1995,2000] (2000,2005] (2005,2009] 3 5 5 5 5 5 5 5 5 5 5 3 table (yr5,age5) Error in table(yr5, age5) : all arguments must have the same length Sincerely, M On 5. apr. 2010, at 20.59, Bert Gunter wrote: You have tempted, and being weak, I yield to temptation: Any good ideas? Yes. Don't do this. (what you probably really want to do is fit a model with age as a factor, which can be done statistically e.g. by logistic regression; or graphically using conditioning plots, e.g. via trellis graphics (the lattice package). This avoids the arbitrariness and discontinuities of binning by age range.) Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of moleps Sent: Monday, April 05, 2010 11:46 AM To: r-help@r-project.org Subject: [R] Data manipulation problem Dear R´ers. I´ve got a dataset with age and year of diagnosis. In order to age-standardize the incidence I need to transform the data into a matrix with age-groups (divided in 5 or 10 years) along one axis and year divided into 5 years along the other axis. Each cell should contain the number of cases for that age group and for that period. I.e. My data format now is ID-age (to one decimal)-year(yearly data). What I´d like is age 1960-1965 1966-1970 etc... 0-5 3 8 10 15 6-10 2 5 8 13 etc.. Any good ideas? Regards, M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data manipulation problem
Dear R´ers. I´ve got a dataset with age and year of diagnosis. In order to age-standardize the incidence I need to transform the data into a matrix with age-groups (divided in 5 or 10 years) along one axis and year divided into 5 years along the other axis. Each cell should contain the number of cases for that age group and for that period. I.e. My data format now is ID-age (to one decimal)-year(yearly data). What I´d like is age 1960-1965 1966-1970 etc... 0-5 3 8 10 15 6-10 2 5 8 13 etc.. Any good ideas? Regards, M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation problem
I already did try the regression modeling approach. However the epidemiologists (referee) turns out to be quite fond of comparing the incidence rates to different standard populations, hence the need for this labourius approach. And trying the cutting approach I ended up with : table (age5) age5 (0,5] (5,10] (10,15] (15,20] (20,25] (25,30] (30,35] (35,40] (40,45] (45,50] (50,55] (55,60] (60,65] (65,70] (70,75] (75,80] (80,85] (85,100] 35 34 33 47 51 109 157 231 362 511 745 926 1002 866 547 247 82 18 table (yr5) yr5 (1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975] (1975,1980] (1980,1985] (1985,1990] (1990,1995] (1995,2000] (2000,2005] (2005,2009] 3 5 5 5 5 5 5 5 5 5 5 3 table (yr5,age5) Error in table(yr5, age5) : all arguments must have the same length Sincerely, M On 5. apr. 2010, at 20.59, Bert Gunter wrote: You have tempted, and being weak, I yield to temptation: Any good ideas? Yes. Don't do this. (what you probably really want to do is fit a model with age as a factor, which can be done statistically e.g. by logistic regression; or graphically using conditioning plots, e.g. via trellis graphics (the lattice package). This avoids the arbitrariness and discontinuities of binning by age range.) Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of moleps Sent: Monday, April 05, 2010 11:46 AM To: r-help@r-project.org Subject: [R] Data manipulation problem Dear R´ers. I´ve got a dataset with age and year of diagnosis. In order to age-standardize the incidence I need to transform the data into a matrix with age-groups (divided in 5 or 10 years) along one axis and year divided into 5 years along the other axis. Each cell should contain the number of cases for that age group and for that period. I.e. My data format now is ID-age (to one decimal)-year(yearly data). What I´d like is age 1960-1965 1966-1970 etc... 0-5 3 8 10 15 6-10 2 5 8 13 etc.. Any good ideas? Regards, M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data manipulation problem
Thx Erik, I have no idea what went wrong with the other code snippet, but this one works.. Appreciate it. qta- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE)) M On 5. apr. 2010, at 21.45, Erik Iverson wrote: I don't know what your data are like, since you haven't given a reproducible example. I was imagining something like: ## generate fake data age - sample(20:90, 100, replace = TRUE) year - sample(1950:2000, 100, replace = TRUE) ##look at big table table(age, year) ## categorize data ## see include.lowest and right arguments to cut age.factor - cut(age, breaks = seq(20, 90, by = 10), include.lowest = TRUE) year.factor - cut(year, breaks = seq(1950, 2000, by = 10), include.lowest = TRUE) table(age.factor, year.factor) moleps wrote: I already did try the regression modeling approach. However the epidemiologists (referee) turns out to be quite fond of comparing the incidence rates to different standard populations, hence the need for this labourius approach. And trying the cutting approach I ended up with : table (age5) age5 (0,5] (5,10] (10,15] (15,20] (20,25] (25,30] (30,35] (35,40] (40,45] (45,50] (50,55] (55,60] (60,65] (65,70] (70,75] (75,80] (80,85] (85,100] 35 34 33 47 51 109 157 231 362 511 745 926 1002 866 547 247 82 18 table (yr5) yr5 (1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975] (1975,1980] (1980,1985] (1985,1990] (1990,1995] (1995,2000] (2000,2005] (2005,2009] 3 5 5 5 5 5 5 5 5 5 5 3 table (yr5,age5) Error in table(yr5, age5) : all arguments must have the same length Sincerely, M On 5. apr. 2010, at 20.59, Bert Gunter wrote: You have tempted, and being weak, I yield to temptation: Any good ideas? Yes. Don't do this. (what you probably really want to do is fit a model with age as a factor, which can be done statistically e.g. by logistic regression; or graphically using conditioning plots, e.g. via trellis graphics (the lattice package). This avoids the arbitrariness and discontinuities of binning by age range.) Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of moleps Sent: Monday, April 05, 2010 11:46 AM To: r-help@r-project.org Subject: [R] Data manipulation problem Dear R´ers. I´ve got a dataset with age and year of diagnosis. In order to age-standardize the incidence I need to transform the data into a matrix with age-groups (divided in 5 or 10 years) along one axis and year divided into 5 years along the other axis. Each cell should contain the number of cases for that age group and for that period. I.e. My data format now is ID-age (to one decimal)-year(yearly data). What I´d like is age 1960-1965 1966-1970 etc... 0-5 3 8 10 15 6-10 2 5 8 13 etc.. Any good ideas? Regards, M __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.