*Combining 2 columns into 1 column many times in a very large dataset* The clumsy solutions I am working on are not going to be very fast if I can get them to work and the true dataset is ~1500 X 45000 so they need to be efficient. I've searched the R help files and the archives for this list and have some possible workable solutions for 2) and 3) but not my question 1). However, I include 2) and 3) in case anyone has recommendations that would be efficient.
Here is a toy example of the data structure: pop = data.frame(status = rbinom(n, 1, .42), sex = rbinom(n, 1, .5), age = round(rnorm(n, mean=40, 10)), disType = rbinom(n, 1, .2), rs123=c(1,3,1,3,3,1,1,1,3,1), rs123.1=rep(1, n), rs157=c(2,4,2,2,2,4,4,4,2,2), rs157.1=c(4,4,4,2,4,4,4,4,2,2), rs132=c(4,4,4,4,4,4,4,4,2,2), rs132.1=c(4,4,4,4,4,4,4,4,4,4)) Thus, there are a few columns of basic demographic info and then the rest of the columns are biallelic SNP info. Ex: rs123 is allele 1 of rs123 and rs123.1 is the second allele of rs123. 1) I need to merge all the biallelic SNP data that is currently in 2 columns into 1 column, so, for example: rs123 and rs123.1 into one column (but within the dataset): 11 31 11 31 31 11 11 11 31 11 2) I need to identify the least frequent SNP value (in the above example it is 31). 3) I need to replace the least frequent SNP value with 1 and the other(s) with 0. Thank you for any assistance, -S.R. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.