Re: [R] Compact Indicator Matrices
On Sun, May 11, 2008 at 9:49 AM, amarkos [EMAIL PROTECTED] wrote: On May 11, 4:47 pm, Douglas Bates [EMAIL PROTECTED] wrote: Do you mean that you want to collapse similar rows into a single row and perhaps a count of the number of times that this row occurs? Let me rephrase the problem by providing an example. Input: A = [,1] [,2] [1,]11 [2,]13 [3,]21 [4,]12 [5,]21 [6,]12 [7,]11 [8,]12 [9,]13 [10,]21 An important question here is do you start with two or more variables like the columns of your matrix A? If so, there is a more direct method of getting the answers that you want. The natural way to store such variables in R is as factors. I prefer to use letters instead of numbers to represent the levels of a factor (that way I don't confuse a factor with a numeric variable when I look at rows) so I would create a data frame with two factors instead of a matrix. V1 - factor(c(1,1,2,1,2,1,1,1,1,2), labels = LETTERS[1:2]) V2 - factor(c(1,3,1,2,1,2,1,2,3,1), labels = letters[1:3]) df - data.frame(f1 = V1, f2 = V2) df f1 f2 1 A a 2 A c 3 B a 4 A b 5 B a 6 A b 7 A a 8 A b 9 A c 10 B a You could produce the indicator matrix and check for unique rows, etc. - I will show that below - but all you need is the interaction of the two factors df$f12 - with(df, f1:f2)[drop = TRUE] df f1 f2 f12 1 A a A:a 2 A c A:c 3 B a B:a 4 A b A:b 5 B a B:a 6 A b A:b 7 A a A:a 8 A b A:b 9 A c A:c 10 B a B:a str(df) 'data.frame': 10 obs. of 3 variables: $ f1 : Factor w/ 2 levels A,B: 1 1 2 1 2 1 1 1 1 2 $ f2 : Factor w/ 3 levels a,b,c: 1 3 1 2 1 2 1 2 3 1 $ f12: Factor w/ 4 levels A:a,A:b,A:c,..: 1 3 4 2 4 2 1 2 3 4 table(df$f12) A:a A:b A:c B:a 2 3 2 3 as.numeric(df$f12) [1] 1 3 4 2 4 2 1 2 3 4 Notice that this shows you that there are four distinct combinations that occur 2, 3, 2 and 3 times respectively; the first combination occurs in rows 1 and 7, it consists of the first level of f1 and the first level of f2, etc. If you really do want the indicator matrix you could generate it as (ind - cbind(model.matrix(~ 0 + f1, df), model.matrix(~ 0 + f2, df))) f1A f1B f2a f2b f2c 11 0 1 0 0 21 0 0 0 1 30 1 1 0 0 41 0 0 1 0 50 1 1 0 0 61 0 0 1 0 71 0 1 0 0 81 0 0 1 0 91 0 0 0 1 10 0 1 1 0 0 unique(ind) f1A f1B f2a f2b f2c 1 1 0 1 0 0 2 1 0 0 0 1 3 0 1 1 0 0 4 1 0 0 1 0 but working with the factors is generally much simpler than working with the indicators. # Indicator matrix A - data.frame(lapply(data.frame(obj), as.factor)) nocases - dim(obj)[1] novars - dim(obj)[2] # variable levels levels.n - sapply(obj, nlevels) n- cumsum(levels.n) # Indicator matrix calculations Z- matrix(0, nrow = nocases, ncol = n[length(n)]) newdat - lapply(obj, as.numeric) offset - (c(0, n[-length(n)])) for (i in 1:novars) Z[1:nocases + (nocases * (offset[i] + newdat[[i]] - 1))] - 1 ### Output: Z = [,1] [,2] [,3] [,4] [,5] [1,]10100 [2,]10001 [3,]01100 [4,]10010 [5,]01100 [6,]10010 [7,]10100 [8,]10010 [9,]10001 [10,]01100 Z is an indicator matrix in the Multiple Correspondence Analysis framework. My problem is to collapse identical rows (e.g. 2 and 9) into a single row and store the row ids. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compact Indicator Matrices
On Mon, May 12, 2008 at 11:27 AM, amarkos [EMAIL PROTECTED] wrote: Thanks, it works! Could you please provide the direct method you mentioned for the multivariate case? I'm not sure what you mean. I looked at what I wrote and I don't see anything that would fit that description. May I suggest that you continue to cc: the R-help list on the discussion. I can't always respond rapidly to requests and there are many who read the list that can. On May 12, 4:30 pm, Douglas Bates [EMAIL PROTECTED] wrote: On Sun, May 11, 2008 at 9:49 AM, amarkos [EMAIL PROTECTED] wrote: On May 11, 4:47 pm, Douglas Bates [EMAIL PROTECTED] wrote: Do you mean that you want to collapse similar rows into a single row and perhaps a count of the number of times that this row occurs? Let me rephrase the problem by providing an example. Input: A = [,1] [,2] [1,]11 [2,]13 [3,]21 [4,]12 [5,]21 [6,]12 [7,]11 [8,]12 [9,]13 [10,]21 An important question here is do you start with two or more variables like the columns of your matrix A? If so, there is a more direct method of getting the answers that you want. The natural way to store such variables in R is as factors. I prefer to use letters instead of numbers to represent the levels of a factor (that way I don't confuse a factor with a numeric variable when I look at rows) so I would create a data frame with two factors instead of a matrix. V1 - factor(c(1,1,2,1,2,1,1,1,1,2), labels = LETTERS[1:2]) V2 - factor(c(1,3,1,2,1,2,1,2,3,1), labels = letters[1:3]) df - data.frame(f1 = V1, f2 = V2) df f1 f2 1 A a 2 A c 3 B a 4 A b 5 B a 6 A b 7 A a 8 A b 9 A c 10 B a You could produce the indicator matrix and check for unique rows, etc. - I will show that below - but all you need is the interaction of the two factors df$f12 - with(df, f1:f2)[drop = TRUE] df f1 f2 f12 1 A a A:a 2 A c A:c 3 B a B:a 4 A b A:b 5 B a B:a 6 A b A:b 7 A a A:a 8 A b A:b 9 A c A:c 10 B a B:a str(df) 'data.frame': 10 obs. of 3 variables: $ f1 : Factor w/ 2 levels A,B: 1 1 2 1 2 1 1 1 1 2 $ f2 : Factor w/ 3 levels a,b,c: 1 3 1 2 1 2 1 2 3 1 $ f12: Factor w/ 4 levels A:a,A:b,A:c,..: 1 3 4 2 4 2 1 2 3 4 table(df$f12) A:a A:b A:c B:a 2 3 2 3 as.numeric(df$f12) [1] 1 3 4 2 4 2 1 2 3 4 Notice that this shows you that there are four distinct combinations that occur 2, 3, 2 and 3 times respectively; the first combination occurs in rows 1 and 7, it consists of the first level of f1 and the first level of f2, etc. If you really do want the indicator matrix you could generate it as (ind - cbind(model.matrix(~ 0 + f1, df), model.matrix(~ 0 + f2, df))) f1A f1B f2a f2b f2c 11 0 1 0 0 21 0 0 0 1 30 1 1 0 0 41 0 0 1 0 50 1 1 0 0 61 0 0 1 0 71 0 1 0 0 81 0 0 1 0 91 0 0 0 1 10 0 1 1 0 0 unique(ind) f1A f1B f2a f2b f2c 1 1 0 1 0 0 2 1 0 0 0 1 3 0 1 1 0 0 4 1 0 0 1 0 but working with the factors is generally much simpler than working with the indicators. # Indicator matrix A - data.frame(lapply(data.frame(obj), as.factor)) nocases - dim(obj)[1] novars - dim(obj)[2] # variable levels levels.n - sapply(obj, nlevels) n- cumsum(levels.n) # Indicator matrix calculations Z- matrix(0, nrow = nocases, ncol = n[length(n)]) newdat - lapply(obj, as.numeric) offset - (c(0, n[-length(n)])) for (i in 1:novars) Z[1:nocases + (nocases * (offset[i] + newdat[[i]] - 1))] - 1 ### Output: Z = [,1] [,2] [,3] [,4] [,5] [1,]10100 [2,]10001 [3,]01100 [4,]10010 [5,]01100 [6,]10010 [7,]10100 [8,]10010 [9,]10001 [10,]01100 Z is an indicator matrix in the Multiple Correspondence Analysis framework. My problem is to collapse identical rows (e.g. 2 and 9) into a single row and store the row ids. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [EMAIL PROTECTED] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Angelos Markos Dr. of Applied Informatics, University of
Re: [R] Compact Indicator Matrices
Thanks. It works! I think I found another solution, working straight with the indicator matrix. count - factor(table(apply(ind, 1, paste, collapse=))) However, that way I can't store the indices of the collapsed rows. -Angelos Markos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compact Indicator Matrices
On Sat, May 10, 2008 at 5:27 AM, amarkos [EMAIL PROTECTED] wrote: An indicator matrix is a binary matrix with orthogonal columns whose rows sum to 1. A row of this matrix could be [0 1 0 0]. My problem is to group the similar rows (profiles) so that to create a compact form of the matrix. I'm not sure exactly what you mean by a compact form of this matrix. Do you mean that you want to collapse similar rows into a single row and perhaps a count of the number of times that this row occurs? In R indicator matrices are typically generated from a factor and essentially you are asking for the tabulation of the factor, such as provided by the functions table and xtabs. Is there an R function that deals with this problem or do I have to write it from scratch? Thanks, Angelos Markos Dr. Applied Informatics, University of Macedonia, Greece __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Compact Indicator Matrices
On May 11, 4:47 pm, Douglas Bates [EMAIL PROTECTED] wrote: Do you mean that you want to collapse similar rows into a single row and perhaps a count of the number of times that this row occurs? Let me rephrase the problem by providing an example. Input: A = [,1] [,2] [1,]11 [2,]13 [3,]21 [4,]12 [5,]21 [6,]12 [7,]11 [8,]12 [9,]13 [10,]21 # Indicator matrix A - data.frame(lapply(data.frame(obj), as.factor)) nocases - dim(obj)[1] novars - dim(obj)[2] # variable levels levels.n - sapply(obj, nlevels) n- cumsum(levels.n) # Indicator matrix calculations Z- matrix(0, nrow = nocases, ncol = n[length(n)]) newdat - lapply(obj, as.numeric) offset - (c(0, n[-length(n)])) for (i in 1:novars) Z[1:nocases + (nocases * (offset[i] + newdat[[i]] - 1))] - 1 ### Output: Z = [,1] [,2] [,3] [,4] [,5] [1,]10100 [2,]10001 [3,]01100 [4,]10010 [5,]01100 [6,]10010 [7,]10100 [8,]10010 [9,]10001 [10,]01100 Z is an indicator matrix in the Multiple Correspondence Analysis framework. My problem is to collapse identical rows (e.g. 2 and 9) into a single row and store the row ids. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Compact Indicator Matrices
An indicator matrix is a binary matrix with orthogonal columns whose rows sum to 1. A row of this matrix could be [0 1 0 0]. My problem is to group the similar rows (profiles) so that to create a compact form of the matrix. Is there an R function that deals with this problem or do I have to write it from scratch? Thanks, Angelos Markos Dr. Applied Informatics, University of Macedonia, Greece __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.