Dear R-list,

I'm working with a data frame which dimensions are

> dim(GERU)
[1] 3468  318

and looks like

> GERU[1:10,1:10]
       ped ind par1 par2 sex sta rs7696470 rs7696470.1 rs1032896 rs1032896.1
1  USA5854   2    0    0   2   1         4           4         1           1
2  USA5854   3    1    2   1   1         4           4         1           1
3  USA5854   4    1    2   2   2         1           4         1           3
4  USA5854   5    1    2   1   2         4           2         2           1
5  USA5855   1    0    0   1   1         0           0         0           0
6  USA5855   2    0    0   2   2         1           0         0           0
7  USA5855   3    1    2   1   2         0           2         0           0
8  USA5855   4    1    2   1   1         2           0         2           1
9  USA5855   5    1    2   1   2         0           1         0           0
10 USA5856   1    0    0   1   1        3           3         3           3

What I would like to do is:

1. Identify which column (from 6 to 318) has more than 4 categories (I
solved that). In GERU would be rs7696470 and rs7696470.1.
2. Using the columns in step 1, replace its entries equals to 2 for 3. For
example, rs7696470 would be 4,4,1,4,0,1,0,3,0,3 and so on.
3. Once replaced the entries, I need to rewrite the columns in GERU.

Here is what I've done:

> # Function to identify columns with 3 or more categories
> tx=function(x) ifelse(dim(table(x))>4,1,0)

> # Identifying the columns
> M4=apply(GUPN[,-c(1:6)],2,tx)
> names(which(MR==1))                    # Step 1
 [1] "rs335322"     "rs335322.1"   "rs186750"     "rs186750.1"
"rs1565901"    "rs1565901.1"  "rs1565902"
 [8] "rs1565902.1"  "rs11131334"   "rs11131334.1" "rs1948616"    "
rs1948616.1"  "rs4484334"    "rs4484334.1"
[15] "rs1497921"    "rs1497921.1"  "rs1391320"    "rs1391320.1"
"rs1497913"    "rs1497913.1"  "rs996208"
[22] "rs996208.1"
> # Step 2
> REPLACE=GUPN[,names(which(AR==1))]
> RES=apply(REPLACE,2,function(x) ifelse(x==2,3,x))
> RES[1:10,1:5]
   rs335322 rs335322.1 rs186750 rs186750.1 rs1565901
1         1          3        3          3         3
2         1          1        3          3         3
3         3          3        1          3         3
4         1          3        3          3         3
5         0          0        0          0         0
6         0          0        0          0         0
7         0          0        0          0         0
8         0          0        0          0         0
9         0          0        0          0         0
10        1          3        3          3         1

Now, the problem I have is replacing the columns in GERU by the columns in
RES (step 3). At the end the dimension of the new data set should be
3468x318. Any help would be greatly appreciated.

Thanks you so much,


Jorge

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to