RE: [R] beginner programming question
Thank you all! I did it, and it worked just fine. In the last week I've been torturing the syntaxes in various ways, until finally it was all clear. The subscripting solution opened new doors for me. Particularly, the reshape command gave me about three days of a head ache. I read the help about 20 times, trying to figure out how to do it; the trouble with the help was that it doesn't present examples of reshaping for multiple sets of varying variables, nor that the new variables' names in the long format should be defined as a vector with the v.names attribute. Anyway, the syntax is: x - read.table(clipboard, header=T) x rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 113 NA 25 232 NA121 NA 2413 35 67 34 102212 3144 39 40 59 601221 44 NA NA 45 70 NA NA22 NA NA xx - reshape(x, varying=list(names(x)[1:3], names(x)[5:7], + names(x)[9:11]), v.names=c(rel, age, sex), direction=long) xx age0 sex0 time rel age sex id 1.1 2511 1 23 2 1 2.1 3521 4 67 2 2 3.1 3911 1 40 2 3 4.1 4521 4 70 2 4 1.2 2512 3 2 1 1 2.2 3522 1 34 1 2 3.2 3912 4 59 2 3 4.2 4522 NA NA NA 4 1.3 2513 NA NA NA 1 2.3 3523 3 10 2 2 3.3 3913 4 60 1 3 4.3 4523 NA NA NA 4 xx - subset(xx, xx$rel==1) rbind(subset(xx, xx$sex0==1)[,c(age0,age)], + subset(xx, xx$sex==1)[,c(age,age0)]) age0 age 1.1 25 23 3.1 39 40 2.2 35 34 I wish you a Merry Xmas, you are a truly great community. Adrian -Original Message- From: Thomas Lumley [mailto:[EMAIL PROTECTED] Sent: Thursday, December 18, 2003 5:53 PM To: Tony Plate Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [R] beginner programming question On Wed, 17 Dec 2003, Tony Plate wrote: Another way to approach this is to first massage the data into a more regular format. This may or may not be simpler or faster than other solutions suggested. You could also use the reshape() command to do the massaging -thomas x - read.table(clipboard, header=T) x rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 113 NA 25 232 NA121 NA 2413 35 67 34 102212 3144 39 40 59 601221 44 NA NA 45 70 NA NA22 NA NA nn - c(rel,age0,age,sex0,sex) xx - rbind(colnames-(x[,c(rel1,age0,age1,sex0,sex1)], nn), + colnames-(x[,c(rel2,age0,age2,sex0,sex2)], nn), + colnames-(x[,c(rel3,age0,age3,sex0,sex3)], nn)) xx rel age0 age sex0 sex 11 25 231 2 24 35 672 2 31 39 401 2 44 45 702 2 11 3 25 21 1 21 1 35 342 1 31 4 39 591 2 41 NA 45 NA2 NA 12 NA 25 NA1 NA 22 3 35 102 2 32 4 39 601 1 42 NA 45 NA2 NA rbind(subset(xx, xx$rel==1 (xx$sex0==1 | xx$sex0==xx$sex))[,c(age0,age)], subset(xx, xx$rel==1 xx$sex==1 xx$sex0!=xx$sex)[,c(age,age0)]) age0 age 125 23 339 40 21 35 34 hope this helps, Tony Plate PS. To advanced R users: Is the above usage of the colnames- function within an expression regarded as acceptable or as undesirable programming style? -- I've rarely seen it used, but it can be quite useful. - This mail sent through IMP: http://horde.org/imp/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] beginner programming question
On Wed, 17 Dec 2003, Tony Plate wrote: Another way to approach this is to first massage the data into a more regular format. This may or may not be simpler or faster than other solutions suggested. You could also use the reshape() command to do the massaging -thomas x - read.table(clipboard, header=T) x rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 113 NA 25 232 NA121 NA 2413 35 67 34 102212 3144 39 40 59 601221 44 NA NA 45 70 NA NA22 NA NA nn - c(rel,age0,age,sex0,sex) xx - rbind(colnames-(x[,c(rel1,age0,age1,sex0,sex1)], nn), + colnames-(x[,c(rel2,age0,age2,sex0,sex2)], nn), + colnames-(x[,c(rel3,age0,age3,sex0,sex3)], nn)) xx rel age0 age sex0 sex 11 25 231 2 24 35 672 2 31 39 401 2 44 45 702 2 11 3 25 21 1 21 1 35 342 1 31 4 39 591 2 41 NA 45 NA2 NA 12 NA 25 NA1 NA 22 3 35 102 2 32 4 39 601 1 42 NA 45 NA2 NA rbind(subset(xx, xx$rel==1 (xx$sex0==1 | xx$sex0==xx$sex))[,c(age0,age)], subset(xx, xx$rel==1 xx$sex==1 xx$sex0!=xx$sex)[,c(age,age0)]) age0 age 125 23 339 40 21 35 34 hope this helps, Tony Plate PS. To advanced R users: Is the above usage of the colnames- function within an expression regarded as acceptable or as undesirable programming style? -- I've rarely seen it used, but it can be quite useful. At Wednesday 09:28 PM 12/17/2003 +0200, Adrian Dusa wrote: Hi all, The last e-mails about beginners gave me the courage to post a question; from a beginner's perspective, there are a lot of questions that I'm tempted to ask. But I'm trying to find the answers either in the documentation, either in the about 15 free books I have, either in the help archives (I often found many similar questions posted in the past). Being an (still actual) user of SPSS, I'd like to be able to do everything in R. I've learned that the best way of doing it is to struggle and find a solution no matter what, refraining from doing it with SPSS. I've became more and more aware of the almost unlimited possibilities that R offers and I'd like to completely switch to R whenever I think I'm ready. I have a (rather theoretical) programming problem for which I have found a solution, but I feel it is a rather poor one. I wonder if there's some other (more clever) solution, using (maybe?) vectorization or subscripting. A toy example would be: rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 1 3 NA25 23 2 NA 1 2 1 NA 4 1 3 35 67 34 10 2 2 1 2 1 4 4 39 40 59 60 1 2 2 1 4 NANA45 70 NANA 2 2 NANA where rel1...3 states the kinship with the respondent (person 0) code 1 meaning husband/wife, code 4 meaning parent and code 3 for children. I would like to get the age for husbands (code 1) in a first column and wife's age in the second: ageh agew 25 23 34 35 39 40 My solution uses *for* loops and *if*s checking for code 1 in each element in the first 3 columns, then checking in the last three columns for husband's code, then taking the corresponding age in a new matrix. I've learned that *for* loops are very slow (and indeed with my dataset of some 2000 rows and 13 columns for kinship it takes quite a lot). I found the Looping chapter in S poetry very useful (it did saved me from *for* loops a couple of times, thanks!). Any hints would be appreciated, Adrian Adrian Dusa ([EMAIL PROTECTED]) Romanian Social Data Archive (www.roda.ro http://www.roda.ro/ ) 1, Schitu Magureanu Bd. 76625 Bucharest sector 5 Romania Tel./Fax: +40 (21) 312.66.18\ +40 (21) 312.02.10/ int.101 [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] beginner programming question
Define function f to take a vector as input representing a single input row. f should (1) transform this to a vector representing the required row of output or else (2) produce NULL if no row is to be output for that input row. Then use this code where z is your input matrix: t( matrix( unlist( apply( z, 1, f ) ), 2) ) --- Date: Wed, 17 Dec 2003 21:28:05 +0200 From: Adrian Dusa [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [R] beginner programming question Hi all, The last e-mails about beginners gave me the courage to post a question; from a beginner's perspective, there are a lot of questions that I'm tempted to ask. But I'm trying to find the answers either in the documentation, either in the about 15 free books I have, either in the help archives (I often found many similar questions posted in the past). Being an (still actual) user of SPSS, I'd like to be able to do everything in R. I've learned that the best way of doing it is to struggle and find a solution no matter what, refraining from doing it with SPSS. I've became more and more aware of the almost unlimited possibilities that R offers and I'd like to completely switch to R whenever I think I'm ready. I have a (rather theoretical) programming problem for which I have found a solution, but I feel it is a rather poor one. I wonder if there's some other (more clever) solution, using (maybe?) vectorization or subscripting. A toy example would be: rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 1 3 NA 25 23 2 NA 1 2 1 NA 4 1 3 35 67 34 10 2 2 1 2 1 4 4 39 40 59 60 1 2 2 1 4 NA NA 45 70 NA NA 2 2 NA NA where rel1...3 states the kinship with the respondent (person 0) code 1 meaning husband/wife, code 4 meaning parent and code 3 for children. I would like to get the age for husbands (code 1) in a first column and wife's age in the second: ageh agew 25 23 34 35 39 40 My solution uses *for* loops and *if*s checking for code 1 in each element in the first 3 columns, then checking in the last three columns for husband's code, then taking the corresponding age in a new matrix. I've learned that *for* loops are very slow (and indeed with my dataset of some 2000 rows and 13 columns for kinship it takes quite a lot). I found the Looping chapter in S poetry very useful (it did saved me from *for* loops a couple of times, thanks!). Any hints would be appreciated, Adrian Adrian Dusa ([EMAIL PROTECTED]) Romanian Social Data Archive (www.roda.ro http://www.roda.ro/; ) 1, Schitu Magureanu Bd. 76625 Bucharest sector 5 Romania __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] beginner programming question
From: Gabor Grothendieck [EMAIL PROTECTED] Date: Wed, 17 Dec 2003 15:02:49 -0500 (EST) Define function f to take a vector as input representing a single input row. f should (1) transform this to a vector representing the required row of output or else (2) produce NULL if no row is to be output for that input row. Then use this code where z is your input matrix: t( matrix( unlist( apply( z, 1, f ) ), 2) ) But as has been pointed out recently, apply really is still just a for loop. From: Adrian Dusa [EMAIL PROTECTED] Date: Wed, 17 Dec 2003 21:28:05 +0200 I have a (rather theoretical) programming problem for which I have found a solution, but I feel it is a rather poor one. I wonder if there's some other (more clever) solution, using (maybe?) vectorization or subscripting. Here is a subscripting solution, where (for consistency with above) z is your data [from read.table(filename, header=T)]: z rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 113 NA 25 232 NA121 NA 2413 35 67 34 102212 3144 39 40 59 601221 44 NA NA 45 70 NA NA22 NA NA res - matrix(NA, nrow=length(z[, 1]), ncol=2, dimnames=list(rownames=rownames(z), colnames=c(ageh, agew))) w - w0 - w1 - w2 - which(z[, c(rel1, rel2, rel3)] == 1, T) # find spouse entries w0[, 2] - z[, sex0][w[, 1]]# indices for respondent's age w1[, 2] - 3 - w0[, 2]# indices for spouse's age w2[, 2] - 4 + w[, 2] # indices of spouse's age res[w0] - z[, age0][w[, 1]]# set respondent's age res[w1] - z[w2] # set spouse's age res colnames rownames ageh agew 1 25 23 2 34 35 3 39 40 4 NA NA Ray Brownrigg __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] beginner programming question
This is just a response to the part where you refer to an apply loop really being a for loop. In a sense this true, but it should nevertheless be recognized that the apply solution has a number of advantages over for: - it nicely separates the problem into a single line that is independent of the details of the problem and localizes them in f - the rows are pasted together automatically avoiding messy appending or creation and filling in of a structure - it avoids the use of indices Of course, some apply loops come pretty close to for loops. For example, consider this variation: t( matrix( unlist (sapply( 1:nrow(z), function(i) f(z[i,]) ) ), 2 )) and compare it to the for loop: out - NULL for ( i in 1:nrow(z) ) { v - f( z[i,] ) if ( ! is.null(v) ) out - rbind( out, v ) } but even this apply, which is clearly inferior to the one in my original posting, retains the first two advantages listed. --- Date: Thu, 18 Dec 2003 10:04:52 +1300 (NZDT) From: Ray Brownrigg [EMAIL PROTECTED] To: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: RE: [R] beginner programming question From: Gabor Grothendieck [EMAIL PROTECTED] Date: Wed, 17 Dec 2003 15:02:49 -0500 (EST) Define function f to take a vector as input representing a single input row. f should (1) transform this to a vector representing the required row of output or else (2) produce NULL if no row is to be output for that input row. Then use this code where z is your input matrix: t( matrix( unlist( apply( z, 1, f ) ), 2) ) But as has been pointed out recently, apply really is still just a for loop. From: Adrian Dusa [EMAIL PROTECTED] Date: Wed, 17 Dec 2003 21:28:05 +0200 I have a (rather theoretical) programming problem for which I have found a solution, but I feel it is a rather poor one. I wonder if there's some other (more clever) solution, using (maybe?) vectorization or subscripting. Here is a subscripting solution, where (for consistency with above) z is your data [from read.table(filename, header=T)]: z rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 1 1 3 NA 25 23 2 NA 1 2 1 NA 2 4 1 3 35 67 34 10 2 2 1 2 3 1 4 4 39 40 59 60 1 2 2 1 4 4 NA NA 45 70 NA NA 2 2 NA NA res - matrix(NA, nrow=length(z[, 1]), ncol=2, dimnames=list(rownames=rownames(z), colnames=c(ageh, agew))) w - w0 - w1 - w2 - which(z[, c(rel1, rel2, rel3)] == 1, T) # find spouse entries w0[, 2] - z[, sex0][w[, 1]] # indices for respondent's age w1[, 2] - 3 - w0[, 2] # indices for spouse's age w2[, 2] - 4 + w[, 2] # indices of spouse's age res[w0] - z[, age0][w[, 1]] # set respondent's age res[w1] - z[w2] # set spouse's age res colnames rownames ageh agew 1 25 23 2 34 35 3 39 40 4 NA NA Ray Brownrigg __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] beginner programming question
Another way to approach this is to first massage the data into a more regular format. This may or may not be simpler or faster than other solutions suggested. x - read.table(clipboard, header=T) x rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 113 NA 25 232 NA121 NA 2413 35 67 34 102212 3144 39 40 59 601221 44 NA NA 45 70 NA NA22 NA NA nn - c(rel,age0,age,sex0,sex) xx - rbind(colnames-(x[,c(rel1,age0,age1,sex0,sex1)], nn), + colnames-(x[,c(rel2,age0,age2,sex0,sex2)], nn), + colnames-(x[,c(rel3,age0,age3,sex0,sex3)], nn)) xx rel age0 age sex0 sex 11 25 231 2 24 35 672 2 31 39 401 2 44 45 702 2 11 3 25 21 1 21 1 35 342 1 31 4 39 591 2 41 NA 45 NA2 NA 12 NA 25 NA1 NA 22 3 35 102 2 32 4 39 601 1 42 NA 45 NA2 NA rbind(subset(xx, xx$rel==1 (xx$sex0==1 | xx$sex0==xx$sex))[,c(age0,age)], subset(xx, xx$rel==1 xx$sex==1 xx$sex0!=xx$sex)[,c(age,age0)]) age0 age 125 23 339 40 21 35 34 hope this helps, Tony Plate PS. To advanced R users: Is the above usage of the colnames- function within an expression regarded as acceptable or as undesirable programming style? -- I've rarely seen it used, but it can be quite useful. At Wednesday 09:28 PM 12/17/2003 +0200, Adrian Dusa wrote: Hi all, The last e-mails about beginners gave me the courage to post a question; from a beginner's perspective, there are a lot of questions that I'm tempted to ask. But I'm trying to find the answers either in the documentation, either in the about 15 free books I have, either in the help archives (I often found many similar questions posted in the past). Being an (still actual) user of SPSS, I'd like to be able to do everything in R. I've learned that the best way of doing it is to struggle and find a solution no matter what, refraining from doing it with SPSS. I've became more and more aware of the almost unlimited possibilities that R offers and I'd like to completely switch to R whenever I think I'm ready. I have a (rather theoretical) programming problem for which I have found a solution, but I feel it is a rather poor one. I wonder if there's some other (more clever) solution, using (maybe?) vectorization or subscripting. A toy example would be: rel1 rel2 rel3 age0 age1 age2 age3 sex0 sex1 sex2 sex3 1 3 NA25 23 2 NA 1 2 1 NA 4 1 3 35 67 34 10 2 2 1 2 1 4 4 39 40 59 60 1 2 2 1 4 NANA45 70 NANA 2 2 NANA where rel1...3 states the kinship with the respondent (person 0) code 1 meaning husband/wife, code 4 meaning parent and code 3 for children. I would like to get the age for husbands (code 1) in a first column and wife's age in the second: ageh agew 25 23 34 35 39 40 My solution uses *for* loops and *if*s checking for code 1 in each element in the first 3 columns, then checking in the last three columns for husband's code, then taking the corresponding age in a new matrix. I've learned that *for* loops are very slow (and indeed with my dataset of some 2000 rows and 13 columns for kinship it takes quite a lot). I found the Looping chapter in S poetry very useful (it did saved me from *for* loops a couple of times, thanks!). Any hints would be appreciated, Adrian Adrian Dusa ([EMAIL PROTECTED]) Romanian Social Data Archive (www.roda.ro http://www.roda.ro/ ) 1, Schitu Magureanu Bd. 76625 Bucharest sector 5 Romania Tel./Fax: +40 (21) 312.66.18\ +40 (21) 312.02.10/ int.101 [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] beginner programming question
Tony Plate [EMAIL PROTECTED] writes: xx - rbind(colnames-(x[,c(rel1,age0,age1,sex0,sex1)], nn), + colnames-(x[,c(rel2,age0,age2,sex0,sex2)], nn), + colnames-(x[,c(rel3,age0,age3,sex0,sex3)], nn)) PS. To advanced R users: Is the above usage of the colnames- function within an expression regarded as acceptable or as undesirable programming style? -- I've rarely seen it used, but it can be quite useful. I wouldn't be happy with it. These assignment functions can do things that really only makes sense when used in the context of foo(x) - bar. It is true that if you define foo- as an ordinary R function of x and bar that returns the modified x, then foo(x)-bar will work, but the converse might not be true. The programmer may have done things for the sake of efficiency that makes foo- behave in non-standard ways. In particular it might destructively modify its x argument. In the above case, the modified argument is a temporary, so it is likely to be safe, but as a programming paradigm it might spring some nasty surprises in the face of the unsuspecting user. So I'd prefer something like xx - do.call(rbind, lapply(list(x[,c(rel1,age0,age1,sex0,sex1)], x[,c(rel2,age0,age2,sex0,sex2)], x[,c(rel3,age0,age3,sex0,sex3)]), function(x) {colnames(x) - nn; x}) -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] beginner programming question
Thanks. As a follow-up question, is it considered acceptable programming practice for - functions to modify their x argument? -- Tony Plate At Thursday 12:23 AM 12/18/2003 +0100, Peter Dalgaard wrote: Tony Plate [EMAIL PROTECTED] writes: xx - rbind(colnames-(x[,c(rel1,age0,age1,sex0,sex1)], nn), + colnames-(x[,c(rel2,age0,age2,sex0,sex2)], nn), + colnames-(x[,c(rel3,age0,age3,sex0,sex3)], nn)) PS. To advanced R users: Is the above usage of the colnames- function within an expression regarded as acceptable or as undesirable programming style? -- I've rarely seen it used, but it can be quite useful. I wouldn't be happy with it. These assignment functions can do things that really only makes sense when used in the context of foo(x) - bar. It is true that if you define foo- as an ordinary R function of x and bar that returns the modified x, then foo(x)-bar will work, but the converse might not be true. The programmer may have done things for the sake of efficiency that makes foo- behave in non-standard ways. In particular it might destructively modify its x argument. In the above case, the modified argument is a temporary, so it is likely to be safe, but as a programming paradigm it might spring some nasty surprises in the face of the unsuspecting user. So I'd prefer something like xx - do.call(rbind, lapply(list(x[,c(rel1,age0,age1,sex0,sex1)], x[,c(rel2,age0,age2,sex0,sex2)], x[,c(rel3,age0,age3,sex0,sex3)]), function(x) {colnames(x) - nn; x}) -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help