[R] Rounding and printing

2009-10-29 Thread Alan Cohen
Hello,

I am trying to print a table with numbers all rounded to the same number of 
digits (one after the decimal), but R seems to want to not print .0 for 
integers.  I can go in and fix it one number at a time, but I'd like to 
understand the principle.  Here's an example of the code.  The problem is the 
13th element, 21 or 21.0:
nvb_deaths - round(ss[,10]/100,digits=1)   
 nvb_deaths
 [1] 56.5  1.6  0.2  3.9  0.1  2.2  0.2  2.6  1.5  4.1  1.1  6.1 21.0
nvb_dths - paste(nvb_deaths, 
(,round(100*nvb_deaths/nvb_deaths[1],digits=1),%),sep=)
 nvb_dths
 [1] 56.5 (100%) 1.6 (2.8%)  0.2 (0.4%)  3.9 (6.9%)  0.1 (0.2%)  2.2 
(3.9%) 
 [7] 0.2 (0.4%)  2.6 (4.6%)  1.5 (2.7%)  4.1 (7.3%)  1.1 (1.9%)  6.1 
(10.8%)
[13] 21 (37.2%) 
 print(nvb_deaths,digits=1)
 [1] 56.5  1.6  0.2  3.9  0.1  2.2  0.2  2.6  1.5  4.1  1.1  6.1 21.0
 paste(print(nvb_deaths,digits=1), 
 (,round(100*nvb_deaths/nvb_deaths[1],digits=1),%),sep=)
 [1] 56.5  1.6  0.2  3.9  0.1  2.2  0.2  2.6  1.5  4.1  1.1  6.1 21.0
 [1] 56.5 (100%) 1.6 (2.8%)  0.2 (0.4%)  3.9 (6.9%)  0.1 (0.2%)  2.2 
(3.9%) 
 [7] 0.2 (0.4%)  2.6 (4.6%)  1.5 (2.7%)  4.1 (7.3%)  1.1 (1.9%)  6.1 
(10.8%)
[13] 21 (37.2%) 

I'm running R v2.8.1 on Windows.  Any help is much appreciated.

Cheers,
Alan Cohen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Drawing lines in margins

2009-07-29 Thread Alan Cohen
Hi all,

Quick question: What function can I use to draw a line in the margin of a plot? 
 segments() and lines() both stop at the margin.

In case the answer depends on exactly what I'm trying to do, see below.  I'm 
using R v. 2.8.1 on Windows XP.

Cheers,
Alan

I'm trying to make a horizontal barplot with a column of numbers on the right 
side.  I'd like to put a line between the column header and the numbers.  The 
following reconstructs the idea - just copy and paste it in:
aa - 1:10
plot.mtx2-cbind(aa,aa+1)
colnames(plot.mtx2)-c(Male,Female)
lci2- cbind(aa-1,aa)
uci2- cbind(aa+1,aa+2)
par(mar=c(5,6,4,5))
cols - c(grey79,grey41)
bplot2-barplot(t(plot.mtx2),beside=TRUE,xlab=Malaria death rates per 100,000,
names.arg=paste(state,aa,sep=),legend.text=F,las=1,xlim=c(0,13), horiz=T, 
col=cols,
main=Malaria death rates by state and sex)
legend(8,6,legend=c(Female,Male),fill=cols[order(2:1)])
segments(y0=bplot2, y1=bplot2, x0=t(lci2), x1=t(uci2))
mtext(10*(aa+1),side=4,line=4,at=seq(3,3*length(aa),by=3)-0.35,padj=0.5,adj=1,las=1,cex=0.85)
mtext(10*aa,side=4,line=4,at=seq(2,3*length(aa)-1,by=3)-0.65,padj=0.5,adj=1,las=1,cex=0.85)
mtext(Estimated,side=4,line=3,at=3*length(aa)+2.75,padj=0.5,adj=0.5,las=1,cex=0.85)
mtext(Deaths,side=4,line=3,at=3*length(aa)+1.25,padj=0.5,adj=0.5,las=1,cex=0.85)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Long to wide format without time variable

2009-06-23 Thread Alan Cohen
Hi all,

I am trying to convert a data set of physician death codings (each individual's 
cause of death is coded by multiple physicians) from long to wide format, but 
the reshape function doesn't seem to work because it requires a time 
variable to identify the sequence among the repeated observations within 
individuals.  My data set has no order, and different numbers of physicians 
code each death, up to 23.  It is also quite large, so for-loops are very slow, 
and I'll need to repeat the procedure multiple times.  So I'm looking for a 
processor-efficient way to replicate reshape without a time variable.

Thanks in advance for any help you can provide.  A worked example and some code 
I've tried are below.  I'm working with R v2.8.1 on Windows XP Professional.

Cheers,
Alan Cohen

Here's what my data look like now:

 id - rep(1:5,2)
 COD - c(A01,A02,A03,A04,A05,B01,A02,B03,B04,A05)
 MDid - c(1:6,3,5,7,2)
 data - as.data.frame(cbind(id,COD,MDid))
 data
   id COD MDid
1   1 A011
2   2 A022
3   3 A033
4   4 A044
5   5 A055
6   1 B016
7   2 A023
8   3 B035
9   4 B047
10  5 A052

And here's what I'd like them to look like:

 id2 - 1:5
 COD.1 - c(A01,A02,A03,A04,A05)
 COD.2 - c(B01,A02,B03,B04,A05)
 MDid.1 - 1:5
 MDid.2 -c(6,3,5,7,2)
 data.wide - as.data.frame(cbind(id2,COD.1,COD.2,MDid.1,MDid.2))
 data.wide
  id2 COD.1 COD.2 MDid.1 MDid.2
1   1   A01   B01  1  6
2   2   A02   A02  2  3
3   3   A03   B03  3  5
4   4   A04   B04  4  7
5   5   A05   A05  5  2

Here's the for-loop that's very slow (with or without the if-clauses activated):

ids-unique(data$id)
ct-length(ids)
codes-matrix(0,ct,11)
colnames(codes)-c(ID,ICD1,Coder1,ICD2,Coder2,ICD3,Coder3,ICD4,Coder4,ICD5,Coder5)
j-0
for (i in 1:ct){
  kkk - ids[i] 
  rpt-data[data$id==kkk,]
  j-max(j,nrow(rpt))
  codes[i,1]-kkk
  codes[i,2]-rpt$ICDCode[1]
  codes[i,3]-rpt$T_Physician_ID[1]
  #if (nrow(rpt)=2){
   codes[i,4]-rpt$ICDCode[2]
   codes[i,5]-rpt$T_Physician_ID[2] 
#if (nrow(rpt)=3) {
 codes[i,6]-rpt$ICDCode[3]
 codes[i,7]-rpt$T_Physician_ID[3]
  #if (nrow(rpt)=4) {
   codes[i,8]-rpt$ICDCode[4]
   codes[i,9]-rpt$T_Physician_ID[4]
  #if (nrow(rpt)=5) {
   codes[i,10]-rpt$ICDCode[5]
   codes[i,11]-rpt$T_Physician_ID[5]
#
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with apply

2009-04-22 Thread Alan Cohen
Hi R users,

I am trying to assign ages to age classes for a large data set (123,000 
records), and using a for-loop was too slow, so I wrote a function and used 
apply.  However, the function does not properly assign the first two classes 
(the rest are fine).  It appears that when age is one digit, it does not get 
assigned properly.  

I tried to provide a small-scale work-up (at the end of the email) but it does 
not reproduce the problem; the best I can do is to provide my code and the 
output below.  As you can see, I've confirmed that age is numeric, that all 
values are integers, and that pieces of the code work independently.  Any 
thoughts would be appreciated.  

To add to the mystery, depending which rows of my data set I select, I get 
different problems.  mds[1:100,] gives the problem above, as do mds[100:200,] , 
mds[150:250,] and mds[1:10100,].  However, with mds[200:300,], 
mds[250:350,] and mds[1000:1100,], only ages with 3 digits are correctly 
assigned - all ages 100 are returned as NA.

I'm using R v 2.8.1 on Windows XP.

Cheers,
Alan Cohen
Centre for Global Health Research, 
Toronto,ON

 ageassign - function(x){
+   y - NA
+   if (x[11] %in% c(0:4)) {y - 0-4}
+   else if (x[11] %in% c(5:14)) {y - 5-14 }
+   else if (x[11] %in% c(15:29)) {y - 15-29 }
+   else if (x[11] %in% c(30:69)) {y - 30-69}
+   else if (x[11] %in% c(70:79)) {y - 70-79}
+   else if (x[11] %in% c(80:125)) {y - 80+}
+   return(y)
+ }
 jj - apply(mds[1:100,],1,FUN=ageassign)
 jj
  1   2   3   4   5   6   7   8   9  10 
 11  12  13 
 NA   80+ 30-69 30-69   80+  NA 30-69 30-69 70-79 15-29 
15-29 30-69 70-79 
 14  15  16  17  18  19  20  21  22  23 
 24  25  26 
  80+  NA 30-69 30-69 30-69   80+   80+ 15-29 70-79 30-69 
70-79 70-79 30-69 
 27  28  29  30  31  32  33  34  35  36 
 37  38  39 
70-79   80+  NA   80+ 70-79  NA 15-29 15-29  NA  NA 
70-79 30-69 30-69 
 40  41  42  43  44  45  46  47  48  49 
 50  51  52 
70-79 30-69 30-69 30-69 70-79 30-69 30-69 70-79 15-29 30-69 
 NA 15-29 30-69 
 53  54  55  56  57  58  59  60  61  62 
 63  64  65 
30-69  NA 70-79 30-69 30-69 30-69 30-69 15-29 30-69 30-69 
70-79 30-69  NA 
 66  67  68  69  70  71  72  73  74  75 
 76  77  78 
30-69 30-69 30-69 30-69 30-69   80+ 30-69   80+ 70-79 30-69 
30-69 30-69  NA 
 79  80  81  82  83  84  85  86  87  88 
 89  90  91 
30-69 30-69 30-69  NA   80+ 30-69 30-69 30-69  NA 15-29 
30-69 30-69 30-69 
 92  93  94  95  96  97  98  99 100 
30-69 30-69 30-69 30-69 70-79 30-69 30-69 30-69 30-69 
 mds[1:100,11]
  [1]  3 82 40 35 82  1 37 57 71 22 21 52 73 86  1 43 60 63 84 88 29 73 69 75 
73 43 75 83  4 83 77  1 27
 [34] 15  1  6 76 51 45 71 54 64 69 70 48 38 74 26 37  4 18 63 59  8 78 63 67 
62 50 21 66 69 75 57  4 50
 [67] 58 60 61 62 83 69 92 75 30 49 69  1 69 63 69  0 93 64 59 69  2 25 32 60 
66 67 54 53 64 79 59 49 59
[100] 64
 table(mds[,11])

   0123456789   10   11   12   13   14   15 
  16   17   18   19 
3123 6441 3856 2884 1968 1615 1386 1088 1098  721  943  681  511  380  426  835 
 571  555  719  653 
  20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35 
  36   37   38   39 
 879  715  672  631  655  773  680  713  769  538  685  566  729  702  652  766 
 683  723  821  675 
  40   41   42   43   44   45   46   47   48   49   50   51   52   53   54   55 
  56   57   58   59 
 774  650  908  892  784  925  781 1043 1161  924 1087  827 1261 1356 1297 1272 
1277 1614 1831 1523 
  60   61   62   63   64   65   66   67   68   69   70   71   72   73   74   75 
  76   77   78   79 
1702 1251 1954 2157 1901 2090 1874 2705 3085 2529 2488 1777 2701 2586 2308 2020 
1801 2269 2486 1856 
  80   81   82   83   84   85   86   87   88   89   90   91   92   93   94   95 
  96   97   98   99 
1762 1047 1413 1326  967 1013  753  870  884  531  601  277  364  301  193  288 
 149  174  169  470 
 100  101  102  103  104  105  106  107  108  114  115  117  118  120  125 
  1525724112112221 
 mode(mds[,11])
[1] numeric

 mds[1,11] %in% c(0:4)
[1] TRUE
 if (mds[1,11] %in% c(0:4)) {y - 0-4}
 y
[1] 0-4

 xx - matrix(trunc(runif(30,0,125)),15,2)
 aassign - function(x){
+   y - NA
+   if (x[2] %in% c(0:4)) {y - 0-4}
+   else if (x[2] %in% c(5:14)) {y - 5-14 }
+   else if (x[2] %in% c(15:29)) {y - 15-29 }
+   else if (x[2] %in% c(30:69)) {y - 30-69}
+   else if (x[2] %in% c(70:79)) {y - 70-79}
+   else if (x[2] %in% c(80:125)) {y - 80+}
+   return(y)
+ }
 jj - apply(xx,1,FUN=aassign)
 t(xx)
 [,1

[R] Weighted principal components analysis?

2009-04-03 Thread Alan Cohen
Hello R-ers,

I'm trying to do a weighted principal components analysis.  I couldn't find any 
such option with princomp or prcomp.  Does anyone know of a package or way to 
do this?

More specifically, the observations I'm working with are averages from 
populations of varying sizes.  I thus need to weight the observations by sample 
size.  Ideally I could apply these weights at the cell level (i.e., allowing 
sample size to vary within observations across variables), but even applying 
them just to the observations would get me most of the way there.

I'm using R v2.8.1 on Windows XP.  I've searched Help and the R site and had no 
luck.  Thanks for any help you can provide.

Cheers,
Alan Cohen
Centre for Global Health Research
Toronto, Ontario

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using apply to get group means

2009-03-31 Thread Alan Cohen
Hi all,

I'm trying to improve my R skills and make my programming more efficient and 
succinct.  I can solve the following question, but wonder if there's a better 
way to do it:

I'm trying to calculate mean by several variables and then put this back into 
the original data set as a new variable.  For example, if I were measuring 
weight, I might want to have each individual's weight, and also the group mean 
by, say, race, sex, and geographic region.  The following code works:

 x1-rep(c(A,B,C),3)
 x2-c(rep(1,3),rep(2,3),1,2,1)
 x3-c(1,2,3,4,5,6,2,6,4)
 x-as.data.frame(cbind(x1,x2,x3))
 x3.mean-rep(0,nrow(x))
 for (i in 1:nrow(x)){
+   x3.mean[i]-mean(as.numeric(x[,3][x[,1]==x[,1][i]x[,2]==x[,2][i]]))
+   }  
 cbind(x,x3.mean)
  x1 x2 x3 x3.mean
1  A  1  1 1.5
2  B  1  2 2.0
3  C  1  3 3.5
4  A  2  4 4.0
5  B  2  5 5.5
6  C  2  6 6.0
7  A  1  2 1.5
8  B  2  6 5.5
9  C  1  4 3.5

However, I'd love to be able to do this with apply rather than a for-loop.  
Or is there a built-in function? Any suggestions?

Also, any way to avoid the hassles with having to convert to a data frame and 
then again to numeric when one variable is character?

Cheers,
Alan Cohen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Memory limits for large data sets

2008-11-05 Thread Alan Cohen
Hello,

I have several very large data sets (1-7 million observations, sometimes 
hundreds of variables) that I'm trying to work with in R, and memory seems to 
be a big issue.  I'm currently using a 2 GB Windows setup, but might have the 
option to run R on a server remotely.  Windows R seems basically limited to 2 
GB memory if I'm right; is there the possibility to go much beyond that with 
server-based R?  In other words, am I limited by R or by my hardware, and how 
much might R be able to handle if I get the hardware necessary?

Also, any possibility of using web-based R for this kind of thing?

Cheers,
Alan Cohen

Alan Cohen
Post-doctoral Fellow
Centre for Global Health Research
70 Richmond St. East, Suite 202A
Toronto, ON M5C 1N8
Canada
(416) 854-3121 (cell)
(416) 864-6060 ext. 3156 (0ffice)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.