Re: [R] Compact Indicator Matrices

2008-05-12 Thread Douglas Bates
On Sun, May 11, 2008 at 9:49 AM, amarkos [EMAIL PROTECTED] wrote:
 On May 11, 4:47 pm, Douglas Bates [EMAIL PROTECTED] wrote:

 Do you mean that you want to collapse similar rows into a single row
 and perhaps a count of the number of times that this row occurs?

 Let me rephrase the problem by providing an example.

 Input:

 A =
  [,1] [,2]
  [1,]11
  [2,]13
  [3,]21
  [4,]12
  [5,]21
  [6,]12
  [7,]11
  [8,]12
  [9,]13
 [10,]21

An important question here is do you start with two or more variables
like the columns of your matrix A?  If so, there is a more direct
method of getting the answers that you want.  The natural way to store
such variables in R is as factors.  I prefer to use letters instead of
numbers to represent the levels of a factor (that way I don't confuse
a factor with a numeric variable when I look at rows)  so I would
create a data frame with two factors instead of a matrix.

 V1 - factor(c(1,1,2,1,2,1,1,1,1,2), labels = LETTERS[1:2])
 V2 - factor(c(1,3,1,2,1,2,1,2,3,1), labels = letters[1:3])
 df - data.frame(f1 = V1, f2 = V2)
 df
   f1 f2
1   A  a
2   A  c
3   B  a
4   A  b
5   B  a
6   A  b
7   A  a
8   A  b
9   A  c
10  B  a

You could produce the indicator matrix and check for unique rows, etc.
- I will show that below - but all you need is the interaction of the
two factors

 df$f12 - with(df, f1:f2)[drop = TRUE]
 df
   f1 f2 f12
1   A  a A:a
2   A  c A:c
3   B  a B:a
4   A  b A:b
5   B  a B:a
6   A  b A:b
7   A  a A:a
8   A  b A:b
9   A  c A:c
10  B  a B:a
 str(df)
'data.frame':   10 obs. of  3 variables:
 $ f1 : Factor w/ 2 levels A,B: 1 1 2 1 2 1 1 1 1 2
 $ f2 : Factor w/ 3 levels a,b,c: 1 3 1 2 1 2 1 2 3 1
 $ f12: Factor w/ 4 levels A:a,A:b,A:c,..: 1 3 4 2 4 2 1 2 3 4
 table(df$f12)

A:a A:b A:c B:a
  2   3   2   3
 as.numeric(df$f12)
 [1] 1 3 4 2 4 2 1 2 3 4

Notice that this shows you that there are four distinct combinations
that occur 2, 3, 2 and 3 times respectively; the first combination
occurs in rows 1 and 7, it consists of the first level of f1 and the
first level of f2, etc.

If you really do want the indicator matrix you could generate it as

 (ind - cbind(model.matrix(~ 0 + f1, df), model.matrix(~ 0 + f2, df)))
   f1A f1B f2a f2b f2c
11   0   1   0   0
21   0   0   0   1
30   1   1   0   0
41   0   0   1   0
50   1   1   0   0
61   0   0   1   0
71   0   1   0   0
81   0   0   1   0
91   0   0   0   1
10   0   1   1   0   0
 unique(ind)
  f1A f1B f2a f2b f2c
1   1   0   1   0   0
2   1   0   0   0   1
3   0   1   1   0   0
4   1   0   0   1   0

but working with the factors is generally much simpler than working
with the indicators.

 # Indicator matrix
 A - data.frame(lapply(data.frame(obj), as.factor))

 nocases - dim(obj)[1]
 novars  - dim(obj)[2]

 # variable levels
 levels.n - sapply(obj, nlevels)
 n- cumsum(levels.n)

 # Indicator matrix calculations
 Z- matrix(0, nrow = nocases, ncol = n[length(n)])
 newdat   - lapply(obj, as.numeric)
 offset   - (c(0, n[-length(n)]))
 for (i in 1:novars)
  Z[1:nocases + (nocases * (offset[i] + newdat[[i]] - 1))] - 1

 ###

 Output:

 Z =

[,1] [,2] [,3] [,4] [,5]
  [1,]10100
  [2,]10001
  [3,]01100
  [4,]10010
  [5,]01100
  [6,]10010
  [7,]10100
  [8,]10010
  [9,]10001
 [10,]01100


 Z is an indicator matrix in the Multiple Correspondence Analysis
 framework.
 My problem is to collapse identical rows (e.g. 2 and 9) into a single
 row and
 store the row ids.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compact Indicator Matrices

2008-05-12 Thread Douglas Bates
On Mon, May 12, 2008 at 11:27 AM, amarkos [EMAIL PROTECTED] wrote:
 Thanks, it works!

 Could you please provide the direct method you mentioned for the
 multivariate case?

I'm not sure what you mean.  I looked at what I wrote and I don't see
anything that would fit that description.

May I suggest that you continue to cc: the R-help list on the
discussion.  I can't always respond rapidly to requests and there are
many who read the list that can.

 On May 12, 4:30 pm, Douglas Bates [EMAIL PROTECTED] wrote:
 On Sun, May 11, 2008 at 9:49 AM, amarkos [EMAIL PROTECTED] wrote:
  On May 11, 4:47 pm, Douglas Bates [EMAIL PROTECTED] wrote:

  Do you mean that you want to collapse similar rows into a single row
  and perhaps a count of the number of times that this row occurs?

  Let me rephrase the problem by providing an example.

  Input:

  A =
   [,1] [,2]
   [1,]11
   [2,]13
   [3,]21
   [4,]12
   [5,]21
   [6,]12
   [7,]11
   [8,]12
   [9,]13
  [10,]21

 An important question here is do you start with two or more variables
 like the columns of your matrix A?  If so, there is a more direct
 method of getting the answers that you want.  The natural way to store
 such variables in R is as factors.  I prefer to use letters instead of
 numbers to represent the levels of a factor (that way I don't confuse
 a factor with a numeric variable when I look at rows)  so I would
 create a data frame with two factors instead of a matrix.

  V1 - factor(c(1,1,2,1,2,1,1,1,1,2), labels = LETTERS[1:2])
  V2 - factor(c(1,3,1,2,1,2,1,2,3,1), labels = letters[1:3])
  df - data.frame(f1 = V1, f2 = V2)
  df

f1 f2
 1   A  a
 2   A  c
 3   B  a
 4   A  b
 5   B  a
 6   A  b
 7   A  a
 8   A  b
 9   A  c
 10  B  a

 You could produce the indicator matrix and check for unique rows, etc.
 - I will show that below - but all you need is the interaction of the
 two factors

  df$f12 - with(df, f1:f2)[drop = TRUE]
  df

f1 f2 f12
 1   A  a A:a
 2   A  c A:c
 3   B  a B:a
 4   A  b A:b
 5   B  a B:a
 6   A  b A:b
 7   A  a A:a
 8   A  b A:b
 9   A  c A:c
 10  B  a B:a str(df)

 'data.frame':   10 obs. of  3 variables:
  $ f1 : Factor w/ 2 levels A,B: 1 1 2 1 2 1 1 1 1 2
  $ f2 : Factor w/ 3 levels a,b,c: 1 3 1 2 1 2 1 2 3 1
  $ f12: Factor w/ 4 levels A:a,A:b,A:c,..: 1 3 4 2 4 2 1 2 3 4

  table(df$f12)

 A:a A:b A:c B:a
   2   3   2   3 as.numeric(df$f12)

  [1] 1 3 4 2 4 2 1 2 3 4

 Notice that this shows you that there are four distinct combinations
 that occur 2, 3, 2 and 3 times respectively; the first combination
 occurs in rows 1 and 7, it consists of the first level of f1 and the
 first level of f2, etc.

 If you really do want the indicator matrix you could generate it as

  (ind - cbind(model.matrix(~ 0 + f1, df), model.matrix(~ 0 + f2, df)))

f1A f1B f2a f2b f2c
 11   0   1   0   0
 21   0   0   0   1
 30   1   1   0   0
 41   0   0   1   0
 50   1   1   0   0
 61   0   0   1   0
 71   0   1   0   0
 81   0   0   1   0
 91   0   0   0   1
 10   0   1   1   0   0 unique(ind)

   f1A f1B f2a f2b f2c
 1   1   0   1   0   0
 2   1   0   0   0   1
 3   0   1   1   0   0
 4   1   0   0   1   0

 but working with the factors is generally much simpler than working
 with the indicators.



  # Indicator matrix
  A - data.frame(lapply(data.frame(obj), as.factor))

  nocases - dim(obj)[1]
  novars  - dim(obj)[2]

  # variable levels
  levels.n - sapply(obj, nlevels)
  n- cumsum(levels.n)

  # Indicator matrix calculations
  Z- matrix(0, nrow = nocases, ncol = n[length(n)])
  newdat   - lapply(obj, as.numeric)
  offset   - (c(0, n[-length(n)]))
  for (i in 1:novars)
   Z[1:nocases + (nocases * (offset[i] + newdat[[i]] - 1))] - 1

  ###

  Output:

  Z =

 [,1] [,2] [,3] [,4] [,5]
   [1,]10100
   [2,]10001
   [3,]01100
   [4,]10010
   [5,]01100
   [6,]10010
   [7,]10100
   [8,]10010
   [9,]10001
  [10,]01100

  Z is an indicator matrix in the Multiple Correspondence Analysis
  framework.
  My problem is to collapse identical rows (e.g. 2 and 9) into a single
  row and
  store the row ids.

  __
  [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 [EMAIL PROTECTED] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Angelos Markos
 Dr. of Applied Informatics,
 University of 

Re: [R] Compact Indicator Matrices

2008-05-12 Thread amarkos
Thanks. It works!

I think I found another solution, working straight with the indicator
matrix.

 count - factor(table(apply(ind, 1, paste, collapse=)))

However, that way I can't store the indices of the collapsed rows.

-Angelos Markos

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compact Indicator Matrices

2008-05-11 Thread Douglas Bates
On Sat, May 10, 2008 at 5:27 AM, amarkos [EMAIL PROTECTED] wrote:
 An indicator matrix is a binary matrix with orthogonal columns whose
 rows sum to 1. A row of this matrix could be [0 1 0 0]. My problem is
 to group the similar rows (profiles) so that to create a compact form
 of the matrix.

I'm not sure exactly what you mean by a compact form of this matrix.
Do you mean that you want to collapse similar rows into a single row
and perhaps a count of the number of times that this row occurs?

In R indicator matrices are typically generated from a factor and
essentially you are asking for the tabulation of the factor, such as
provided by the functions table and xtabs.

 Is there an R function that deals with this problem or do I have to
 write it from scratch?

 Thanks,
 Angelos Markos
 Dr. Applied Informatics,
 University of Macedonia, Greece

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compact Indicator Matrices

2008-05-11 Thread amarkos
On May 11, 4:47 pm, Douglas Bates [EMAIL PROTECTED] wrote:

 Do you mean that you want to collapse similar rows into a single row
 and perhaps a count of the number of times that this row occurs?

Let me rephrase the problem by providing an example.

Input:

A =
  [,1] [,2]
 [1,]11
 [2,]13
 [3,]21
 [4,]12
 [5,]21
 [6,]12
 [7,]11
 [8,]12
 [9,]13
[10,]21

# Indicator matrix
A - data.frame(lapply(data.frame(obj), as.factor))

nocases - dim(obj)[1]
novars  - dim(obj)[2]

# variable levels
levels.n - sapply(obj, nlevels)
n- cumsum(levels.n)

# Indicator matrix calculations
Z- matrix(0, nrow = nocases, ncol = n[length(n)])
newdat   - lapply(obj, as.numeric)
offset   - (c(0, n[-length(n)]))
for (i in 1:novars)
  Z[1:nocases + (nocases * (offset[i] + newdat[[i]] - 1))] - 1

###

Output:

Z =

[,1] [,2] [,3] [,4] [,5]
 [1,]10100
 [2,]10001
 [3,]01100
 [4,]10010
 [5,]01100
 [6,]10010
 [7,]10100
 [8,]10010
 [9,]10001
[10,]01100


Z is an indicator matrix in the Multiple Correspondence Analysis
framework.
My problem is to collapse identical rows (e.g. 2 and 9) into a single
row and
store the row ids.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Compact Indicator Matrices

2008-05-10 Thread amarkos
An indicator matrix is a binary matrix with orthogonal columns whose
rows sum to 1. A row of this matrix could be [0 1 0 0]. My problem is
to group the similar rows (profiles) so that to create a compact form
of the matrix.

Is there an R function that deals with this problem or do I have to
write it from scratch?

Thanks,
Angelos Markos
Dr. Applied Informatics,
University of Macedonia, Greece

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.