Dear all,

I have partial data set with four colums.  First column is "site" with three
factors (i.e., A, B, and C).  From second to fourth columns (v1 ~ v3) are my
observations.  In the observations of the data set, "." indicates missing
value.  I replaced "." with NA.  To replace "." with NA, I used two steps.
First, I replaced "." with NA, and then, changed each variable from factor
to numeric using "df1$v1 <- as.numeric(df1$v1)".  The second step was OK
when I have low numbers of variables, however, it is painful when I have a
lot of variables.

My question is: Is there any much more efficient way to convert this kind of
large scale data?  In short, I am looking for an alternative way of STEP 2.
Or whole procedure if there is.

Any comment will be highly appreciated.

Thank you in advance!!

Steve

P.S.: Below is an example of what I did.

STEP 1
> df1
   site   v1  v2 v3
1     A   10   5  .
2     A   22  54  .
3     A   44 214  2
4     A  521  14  4
5     A    5  73  1
6     A 1654 0.4  4
7     B   16   1  .
8     B    .   4  5
9     B    .   .  4
10    B    .   4  1
11    B   51   .  2
12    B    5   .  .
13    C    1 0.4  .
14    C    0   4  .
15    C    1   1  4
16    C   40   .  7
17    C    4   .  7
18    C   10   .  1
> str(df1)
'data.frame':   18 obs. of  4 variables:
 $ site: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 2 2 2 2 ...
 $ v1  : Factor w/ 13 levels ".","0","1","10",..: 4 7 10 13 11 6 5 1 1 1 ...
 $ v2  : Factor w/ 9 levels ".","0.4","1",..: 7 8 5 4 9 2 3 6 1 6 ...
 $ v3  : Factor w/ 6 levels ".","1","2","4",..: 1 1 3 4 2 4 1 5 4 2 ...
> df1[df1=="."] <- "NA"
Warning messages:
1: In `[<-.factor`(`*tmp*`, thisvar, value = "NA") :
  invalid factor level, NAs generated
2: In `[<-.factor`(`*tmp*`, thisvar, value = "NA") :
  invalid factor level, NAs generated
3: In `[<-.factor`(`*tmp*`, thisvar, value = "NA") :
  invalid factor level, NAs generated
> df1
   site   v1   v2   v3
1     A   10    5 <NA>
2     A   22   54 <NA>
3     A   44  214    2
4     A  521   14    4
5     A    5   73    1
6     A 1654  0.4    4
7     B   16    1 <NA>
8     B <NA>    4    5
9     B <NA> <NA>    4
10    B <NA>    4    1
11    B   51 <NA>    2
12    B    5 <NA> <NA>
13    C    1  0.4 <NA>
14    C    0    4 <NA>
15    C    1    1    4
16    C   40 <NA>    7
17    C    4 <NA>    7
18    C   10 <NA>    1
> str(df1)
'data.frame':   18 obs. of  4 variables:
 $ site: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 2 2 2 2 ...
 $ v1  : Factor w/ 13 levels ".","0","1","10",..: 4 7 10 13 11 6 5 NA NA NA
...
 $ v2  : Factor w/ 9 levels ".","0.4","1",..: 7 8 5 4 9 2 3 6 NA 6 ...
 $ v3  : Factor w/ 6 levels ".","1","2","4",..: NA NA 3 4 2 4 NA 5 4 2 ...

STEP 2.

> df1$v1 <- as.numeric(df1$v1)
> df1$v2 <- as.numeric(df1$v2)
> df1$v3 <- as.numeric(df1$v3)
> df1
   site v1 v2 v3
1     A  4  7 NA
2     A  7  8 NA
3     A 10  5  3
4     A 13  4  4
5     A 11  9  2
6     A  6  2  4
7     B  5  3 NA
8     B NA  6  5
9     B NA NA  4
10    B NA  6  2
11    B 12 NA  3
12    B 11 NA NA
13    C  3  2 NA
14    C  2  6 NA
15    C  3  3  4
16    C  9 NA  6
17    C  8 NA  6
18    C  4 NA  2
> str(df1)
'data.frame':   18 obs. of  4 variables:
 $ site: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 2 2 2 2 ...
 $ v1  : num  4 7 10 13 11 6 5 NA NA NA ...
 $ v2  : num  7 8 5 4 9 2 3 6 NA 6 ...
 $ v3  : num  NA NA 3 4 2 4 NA 5 4 2 ...
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to