Hi Steve,
Here is a suggestion using your original df1:

# Create a copy  -- you can avoid this
newdf1 <- df1

# Process
newdf1[,2:4] <- apply(newdf1[,2:4], 2, function(x) as.numeric(x))

# Removing df1
rm(df1)

# Result
newdf1

# str()
 str(newdf1)
# 'data.frame':   18 obs. of  4 variables:
#  $ site: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 2 2 2 2 ...
#  $ v1  : num  10 22 44 521 5 ...
#  $ v2  : num  5 54 214 14 73 0.4 1 4 NA 4 ...
#  $ v3  : num  NA NA 2 4 1 4 NA 5 4 1 ...

HTH,
Jorge

On Wed, Sep 16, 2009 at 1:50 PM, Steve Hong <> wrote:

> Dear all,
>
> I have partial data set with four colums.  First column is "site" with
> three
> factors (i.e., A, B, and C).  From second to fourth columns (v1 ~ v3) are
> my
> observations.  In the observations of the data set, "." indicates missing
> value.  I replaced "." with NA.  To replace "." with NA, I used two steps.
> First, I replaced "." with NA, and then, changed each variable from factor
> to numeric using "df1$v1 <- as.numeric(df1$v1)".  The second step was OK
> when I have low numbers of variables, however, it is painful when I have a
> lot of variables.
>
> My question is: Is there any much more efficient way to convert this kind
> of
> large scale data?  In short, I am looking for an alternative way of STEP 2.
> Or whole procedure if there is.
>
> Any comment will be highly appreciated.
>
> Thank you in advance!!
>
> Steve
>
> P.S.: Below is an example of what I did.
>
> STEP 1
> > df1
>   site   v1  v2 v3
> 1     A   10   5  .
> 2     A   22  54  .
> 3     A   44 214  2
> 4     A  521  14  4
> 5     A    5  73  1
> 6     A 1654 0.4  4
> 7     B   16   1  .
> 8     B    .   4  5
> 9     B    .   .  4
> 10    B    .   4  1
> 11    B   51   .  2
> 12    B    5   .  .
> 13    C    1 0.4  .
> 14    C    0   4  .
> 15    C    1   1  4
> 16    C   40   .  7
> 17    C    4   .  7
> 18    C   10   .  1
> > str(df1)
> 'data.frame':   18 obs. of  4 variables:
>  $ site: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 2 2 2 2 ...
>  $ v1  : Factor w/ 13 levels ".","0","1","10",..: 4 7 10 13 11 6 5 1 1 1
> ...
>  $ v2  : Factor w/ 9 levels ".","0.4","1",..: 7 8 5 4 9 2 3 6 1 6 ...
>  $ v3  : Factor w/ 6 levels ".","1","2","4",..: 1 1 3 4 2 4 1 5 4 2 ...
> > df1[df1=="."] <- "NA"
> Warning messages:
> 1: In `[<-.factor`(`*tmp*`, thisvar, value = "NA") :
>  invalid factor level, NAs generated
> 2: In `[<-.factor`(`*tmp*`, thisvar, value = "NA") :
>  invalid factor level, NAs generated
> 3: In `[<-.factor`(`*tmp*`, thisvar, value = "NA") :
>  invalid factor level, NAs generated
> > df1
>   site   v1   v2   v3
> 1     A   10    5 <NA>
> 2     A   22   54 <NA>
> 3     A   44  214    2
> 4     A  521   14    4
> 5     A    5   73    1
> 6     A 1654  0.4    4
> 7     B   16    1 <NA>
> 8     B <NA>    4    5
> 9     B <NA> <NA>    4
> 10    B <NA>    4    1
> 11    B   51 <NA>    2
> 12    B    5 <NA> <NA>
> 13    C    1  0.4 <NA>
> 14    C    0    4 <NA>
> 15    C    1    1    4
> 16    C   40 <NA>    7
> 17    C    4 <NA>    7
> 18    C   10 <NA>    1
> > str(df1)
> 'data.frame':   18 obs. of  4 variables:
>  $ site: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 2 2 2 2 ...
>  $ v1  : Factor w/ 13 levels ".","0","1","10",..: 4 7 10 13 11 6 5 NA NA NA
> ...
>  $ v2  : Factor w/ 9 levels ".","0.4","1",..: 7 8 5 4 9 2 3 6 NA 6 ...
>  $ v3  : Factor w/ 6 levels ".","1","2","4",..: NA NA 3 4 2 4 NA 5 4 2 ...
>
> STEP 2.
>
> > df1$v1 <- as.numeric(df1$v1)
> > df1$v2 <- as.numeric(df1$v2)
> > df1$v3 <- as.numeric(df1$v3)
> > df1
>   site v1 v2 v3
> 1     A  4  7 NA
> 2     A  7  8 NA
> 3     A 10  5  3
> 4     A 13  4  4
> 5     A 11  9  2
> 6     A  6  2  4
> 7     B  5  3 NA
> 8     B NA  6  5
> 9     B NA NA  4
> 10    B NA  6  2
> 11    B 12 NA  3
> 12    B 11 NA NA
> 13    C  3  2 NA
> 14    C  2  6 NA
> 15    C  3  3  4
> 16    C  9 NA  6
> 17    C  8 NA  6
> 18    C  4 NA  2
> > str(df1)
> 'data.frame':   18 obs. of  4 variables:
>  $ site: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 2 2 2 2 ...
>  $ v1  : num  4 7 10 13 11 6 5 NA NA NA ...
>  $ v2  : num  7 8 5 4 9 2 3 6 NA 6 ...
>  $ v3  : num  NA NA 3 4 2 4 NA 5 4 2 ...
> >
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to