[R] Rounding and printing
Hello, I am trying to print a table with numbers all rounded to the same number of digits (one after the decimal), but R seems to want to not print .0 for integers. I can go in and fix it one number at a time, but I'd like to understand the principle. Here's an example of the code. The problem is the 13th element, 21 or 21.0: nvb_deaths - round(ss[,10]/100,digits=1) nvb_deaths [1] 56.5 1.6 0.2 3.9 0.1 2.2 0.2 2.6 1.5 4.1 1.1 6.1 21.0 nvb_dths - paste(nvb_deaths, (,round(100*nvb_deaths/nvb_deaths[1],digits=1),%),sep=) nvb_dths [1] 56.5 (100%) 1.6 (2.8%) 0.2 (0.4%) 3.9 (6.9%) 0.1 (0.2%) 2.2 (3.9%) [7] 0.2 (0.4%) 2.6 (4.6%) 1.5 (2.7%) 4.1 (7.3%) 1.1 (1.9%) 6.1 (10.8%) [13] 21 (37.2%) print(nvb_deaths,digits=1) [1] 56.5 1.6 0.2 3.9 0.1 2.2 0.2 2.6 1.5 4.1 1.1 6.1 21.0 paste(print(nvb_deaths,digits=1), (,round(100*nvb_deaths/nvb_deaths[1],digits=1),%),sep=) [1] 56.5 1.6 0.2 3.9 0.1 2.2 0.2 2.6 1.5 4.1 1.1 6.1 21.0 [1] 56.5 (100%) 1.6 (2.8%) 0.2 (0.4%) 3.9 (6.9%) 0.1 (0.2%) 2.2 (3.9%) [7] 0.2 (0.4%) 2.6 (4.6%) 1.5 (2.7%) 4.1 (7.3%) 1.1 (1.9%) 6.1 (10.8%) [13] 21 (37.2%) I'm running R v2.8.1 on Windows. Any help is much appreciated. Cheers, Alan Cohen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Drawing lines in margins
Hi all, Quick question: What function can I use to draw a line in the margin of a plot? segments() and lines() both stop at the margin. In case the answer depends on exactly what I'm trying to do, see below. I'm using R v. 2.8.1 on Windows XP. Cheers, Alan I'm trying to make a horizontal barplot with a column of numbers on the right side. I'd like to put a line between the column header and the numbers. The following reconstructs the idea - just copy and paste it in: aa - 1:10 plot.mtx2-cbind(aa,aa+1) colnames(plot.mtx2)-c(Male,Female) lci2- cbind(aa-1,aa) uci2- cbind(aa+1,aa+2) par(mar=c(5,6,4,5)) cols - c(grey79,grey41) bplot2-barplot(t(plot.mtx2),beside=TRUE,xlab=Malaria death rates per 100,000, names.arg=paste(state,aa,sep=),legend.text=F,las=1,xlim=c(0,13), horiz=T, col=cols, main=Malaria death rates by state and sex) legend(8,6,legend=c(Female,Male),fill=cols[order(2:1)]) segments(y0=bplot2, y1=bplot2, x0=t(lci2), x1=t(uci2)) mtext(10*(aa+1),side=4,line=4,at=seq(3,3*length(aa),by=3)-0.35,padj=0.5,adj=1,las=1,cex=0.85) mtext(10*aa,side=4,line=4,at=seq(2,3*length(aa)-1,by=3)-0.65,padj=0.5,adj=1,las=1,cex=0.85) mtext(Estimated,side=4,line=3,at=3*length(aa)+2.75,padj=0.5,adj=0.5,las=1,cex=0.85) mtext(Deaths,side=4,line=3,at=3*length(aa)+1.25,padj=0.5,adj=0.5,las=1,cex=0.85) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Long to wide format without time variable
Hi all, I am trying to convert a data set of physician death codings (each individual's cause of death is coded by multiple physicians) from long to wide format, but the reshape function doesn't seem to work because it requires a time variable to identify the sequence among the repeated observations within individuals. My data set has no order, and different numbers of physicians code each death, up to 23. It is also quite large, so for-loops are very slow, and I'll need to repeat the procedure multiple times. So I'm looking for a processor-efficient way to replicate reshape without a time variable. Thanks in advance for any help you can provide. A worked example and some code I've tried are below. I'm working with R v2.8.1 on Windows XP Professional. Cheers, Alan Cohen Here's what my data look like now: id - rep(1:5,2) COD - c(A01,A02,A03,A04,A05,B01,A02,B03,B04,A05) MDid - c(1:6,3,5,7,2) data - as.data.frame(cbind(id,COD,MDid)) data id COD MDid 1 1 A011 2 2 A022 3 3 A033 4 4 A044 5 5 A055 6 1 B016 7 2 A023 8 3 B035 9 4 B047 10 5 A052 And here's what I'd like them to look like: id2 - 1:5 COD.1 - c(A01,A02,A03,A04,A05) COD.2 - c(B01,A02,B03,B04,A05) MDid.1 - 1:5 MDid.2 -c(6,3,5,7,2) data.wide - as.data.frame(cbind(id2,COD.1,COD.2,MDid.1,MDid.2)) data.wide id2 COD.1 COD.2 MDid.1 MDid.2 1 1 A01 B01 1 6 2 2 A02 A02 2 3 3 3 A03 B03 3 5 4 4 A04 B04 4 7 5 5 A05 A05 5 2 Here's the for-loop that's very slow (with or without the if-clauses activated): ids-unique(data$id) ct-length(ids) codes-matrix(0,ct,11) colnames(codes)-c(ID,ICD1,Coder1,ICD2,Coder2,ICD3,Coder3,ICD4,Coder4,ICD5,Coder5) j-0 for (i in 1:ct){ kkk - ids[i] rpt-data[data$id==kkk,] j-max(j,nrow(rpt)) codes[i,1]-kkk codes[i,2]-rpt$ICDCode[1] codes[i,3]-rpt$T_Physician_ID[1] #if (nrow(rpt)=2){ codes[i,4]-rpt$ICDCode[2] codes[i,5]-rpt$T_Physician_ID[2] #if (nrow(rpt)=3) { codes[i,6]-rpt$ICDCode[3] codes[i,7]-rpt$T_Physician_ID[3] #if (nrow(rpt)=4) { codes[i,8]-rpt$ICDCode[4] codes[i,9]-rpt$T_Physician_ID[4] #if (nrow(rpt)=5) { codes[i,10]-rpt$ICDCode[5] codes[i,11]-rpt$T_Physician_ID[5] # } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with apply
Hi R users, I am trying to assign ages to age classes for a large data set (123,000 records), and using a for-loop was too slow, so I wrote a function and used apply. However, the function does not properly assign the first two classes (the rest are fine). It appears that when age is one digit, it does not get assigned properly. I tried to provide a small-scale work-up (at the end of the email) but it does not reproduce the problem; the best I can do is to provide my code and the output below. As you can see, I've confirmed that age is numeric, that all values are integers, and that pieces of the code work independently. Any thoughts would be appreciated. To add to the mystery, depending which rows of my data set I select, I get different problems. mds[1:100,] gives the problem above, as do mds[100:200,] , mds[150:250,] and mds[1:10100,]. However, with mds[200:300,], mds[250:350,] and mds[1000:1100,], only ages with 3 digits are correctly assigned - all ages 100 are returned as NA. I'm using R v 2.8.1 on Windows XP. Cheers, Alan Cohen Centre for Global Health Research, Toronto,ON ageassign - function(x){ + y - NA + if (x[11] %in% c(0:4)) {y - 0-4} + else if (x[11] %in% c(5:14)) {y - 5-14 } + else if (x[11] %in% c(15:29)) {y - 15-29 } + else if (x[11] %in% c(30:69)) {y - 30-69} + else if (x[11] %in% c(70:79)) {y - 70-79} + else if (x[11] %in% c(80:125)) {y - 80+} + return(y) + } jj - apply(mds[1:100,],1,FUN=ageassign) jj 1 2 3 4 5 6 7 8 9 10 11 12 13 NA 80+ 30-69 30-69 80+ NA 30-69 30-69 70-79 15-29 15-29 30-69 70-79 14 15 16 17 18 19 20 21 22 23 24 25 26 80+ NA 30-69 30-69 30-69 80+ 80+ 15-29 70-79 30-69 70-79 70-79 30-69 27 28 29 30 31 32 33 34 35 36 37 38 39 70-79 80+ NA 80+ 70-79 NA 15-29 15-29 NA NA 70-79 30-69 30-69 40 41 42 43 44 45 46 47 48 49 50 51 52 70-79 30-69 30-69 30-69 70-79 30-69 30-69 70-79 15-29 30-69 NA 15-29 30-69 53 54 55 56 57 58 59 60 61 62 63 64 65 30-69 NA 70-79 30-69 30-69 30-69 30-69 15-29 30-69 30-69 70-79 30-69 NA 66 67 68 69 70 71 72 73 74 75 76 77 78 30-69 30-69 30-69 30-69 30-69 80+ 30-69 80+ 70-79 30-69 30-69 30-69 NA 79 80 81 82 83 84 85 86 87 88 89 90 91 30-69 30-69 30-69 NA 80+ 30-69 30-69 30-69 NA 15-29 30-69 30-69 30-69 92 93 94 95 96 97 98 99 100 30-69 30-69 30-69 30-69 70-79 30-69 30-69 30-69 30-69 mds[1:100,11] [1] 3 82 40 35 82 1 37 57 71 22 21 52 73 86 1 43 60 63 84 88 29 73 69 75 73 43 75 83 4 83 77 1 27 [34] 15 1 6 76 51 45 71 54 64 69 70 48 38 74 26 37 4 18 63 59 8 78 63 67 62 50 21 66 69 75 57 4 50 [67] 58 60 61 62 83 69 92 75 30 49 69 1 69 63 69 0 93 64 59 69 2 25 32 60 66 67 54 53 64 79 59 49 59 [100] 64 table(mds[,11]) 0123456789 10 11 12 13 14 15 16 17 18 19 3123 6441 3856 2884 1968 1615 1386 1088 1098 721 943 681 511 380 426 835 571 555 719 653 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 879 715 672 631 655 773 680 713 769 538 685 566 729 702 652 766 683 723 821 675 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 774 650 908 892 784 925 781 1043 1161 924 1087 827 1261 1356 1297 1272 1277 1614 1831 1523 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 1702 1251 1954 2157 1901 2090 1874 2705 3085 2529 2488 1777 2701 2586 2308 2020 1801 2269 2486 1856 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 1762 1047 1413 1326 967 1013 753 870 884 531 601 277 364 301 193 288 149 174 169 470 100 101 102 103 104 105 106 107 108 114 115 117 118 120 125 1525724112112221 mode(mds[,11]) [1] numeric mds[1,11] %in% c(0:4) [1] TRUE if (mds[1,11] %in% c(0:4)) {y - 0-4} y [1] 0-4 xx - matrix(trunc(runif(30,0,125)),15,2) aassign - function(x){ + y - NA + if (x[2] %in% c(0:4)) {y - 0-4} + else if (x[2] %in% c(5:14)) {y - 5-14 } + else if (x[2] %in% c(15:29)) {y - 15-29 } + else if (x[2] %in% c(30:69)) {y - 30-69} + else if (x[2] %in% c(70:79)) {y - 70-79} + else if (x[2] %in% c(80:125)) {y - 80+} + return(y) + } jj - apply(xx,1,FUN=aassign) t(xx) [,1
[R] Weighted principal components analysis?
Hello R-ers, I'm trying to do a weighted principal components analysis. I couldn't find any such option with princomp or prcomp. Does anyone know of a package or way to do this? More specifically, the observations I'm working with are averages from populations of varying sizes. I thus need to weight the observations by sample size. Ideally I could apply these weights at the cell level (i.e., allowing sample size to vary within observations across variables), but even applying them just to the observations would get me most of the way there. I'm using R v2.8.1 on Windows XP. I've searched Help and the R site and had no luck. Thanks for any help you can provide. Cheers, Alan Cohen Centre for Global Health Research Toronto, Ontario __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using apply to get group means
Hi all, I'm trying to improve my R skills and make my programming more efficient and succinct. I can solve the following question, but wonder if there's a better way to do it: I'm trying to calculate mean by several variables and then put this back into the original data set as a new variable. For example, if I were measuring weight, I might want to have each individual's weight, and also the group mean by, say, race, sex, and geographic region. The following code works: x1-rep(c(A,B,C),3) x2-c(rep(1,3),rep(2,3),1,2,1) x3-c(1,2,3,4,5,6,2,6,4) x-as.data.frame(cbind(x1,x2,x3)) x3.mean-rep(0,nrow(x)) for (i in 1:nrow(x)){ + x3.mean[i]-mean(as.numeric(x[,3][x[,1]==x[,1][i]x[,2]==x[,2][i]])) + } cbind(x,x3.mean) x1 x2 x3 x3.mean 1 A 1 1 1.5 2 B 1 2 2.0 3 C 1 3 3.5 4 A 2 4 4.0 5 B 2 5 5.5 6 C 2 6 6.0 7 A 1 2 1.5 8 B 2 6 5.5 9 C 1 4 3.5 However, I'd love to be able to do this with apply rather than a for-loop. Or is there a built-in function? Any suggestions? Also, any way to avoid the hassles with having to convert to a data frame and then again to numeric when one variable is character? Cheers, Alan Cohen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Memory limits for large data sets
Hello, I have several very large data sets (1-7 million observations, sometimes hundreds of variables) that I'm trying to work with in R, and memory seems to be a big issue. I'm currently using a 2 GB Windows setup, but might have the option to run R on a server remotely. Windows R seems basically limited to 2 GB memory if I'm right; is there the possibility to go much beyond that with server-based R? In other words, am I limited by R or by my hardware, and how much might R be able to handle if I get the hardware necessary? Also, any possibility of using web-based R for this kind of thing? Cheers, Alan Cohen Alan Cohen Post-doctoral Fellow Centre for Global Health Research 70 Richmond St. East, Suite 202A Toronto, ON M5C 1N8 Canada (416) 854-3121 (cell) (416) 864-6060 ext. 3156 (0ffice) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.