On 19/12/2018 6:48 AM, Luigi Marongiu wrote:
Thank you,
that worked fine for me.
Best wishes of merry Christmas and happy new year,
Luigi


Actually it's wrong!  Sorry about that.

If you look at my.data.new$column_2, you'll see that the levels have changed:

> my.data
  column_1 column_2 column_3
1        A        B        A
2        B        B        A
3        C        C        B
4        D        E        B
5        E        E        A


> my.data.new
  column_1 column_2 column_3
1        A        A        A
2        B        A        A
3        C        B        B
4        D        C        B
5        E        C        A

What you want is this instead:

my.data.new <- as.data.frame(lapply(my.data, function(x) {factor(x, levels = thelevels)}))

The last example in the ?levels help page does this too. I wonder if that is intentional?

levels> ## we can add levels this way:
levels> f <- factor(c("a","b"))

levels> levels(f) <- c("c", "a", "b")

levels> f
[1] c a
Levels: c a b

levels> f <- factor(c("a","b"))

levels> levels(f) <- list(C = "C", A = "a", B = "b")

levels> f
[1] A B
Levels: C A B

Duncan Murdoch

On Wed, Dec 19, 2018 at 12:19 PM Duncan Murdoch
<murdoch.dun...@gmail.com> wrote:

On 19/12/2018 5:58 AM, Luigi Marongiu wrote:
Dear all,
I have a data frame with character values where each character is a
level; however, not all columns of the data frame have the same
characters thus, when generating the data frame with stringsAsFactors
= TRUE, the levels are different for each column.
Is there a way to provide a single vector of levels and assign the
characters so that they match such vector?
Is there a way to do that not only when setting the data frame but
also when reading data from a file with read.table()?

For instance, I have:
column_1 = c("A", "B", "C", "D", "E")
column_2 = c("B", "B", "C", "E", "E")
column_3 = c("C", "C", "D", "D", "C")
my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)
str(my.data)
'data.frame': 5 obs. of  3 variables:
   $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
   $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
   $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1

Thank you


I don't think read.table() can do it for you automatically.  To do it
yourself, you need to get a vector of the levels.  If you know this,
just assign it to a variable; if you don't know it, compute it as

    thelevels <- unique(unlist(lapply(my.data, levels)))

Then set the levels of each column to thelevels:

    my.data.new <- as.data.frame(lapply(my.data, function(x) {levels(x)
<- thelevels; x}))

Duncan Murdoch




______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to