Re: [R] Remove similar rows from matrix

2012-08-23 Thread PIKAL Petr
Hi

I cannot reproduce exactly what you want but maybe you can elaborate this to 
suit your needs.

sel1-rowSums(is.na(mat)) # number of NA values
sel2-c(0,rowSums(apply(mat,2,diff)==0, na.rm=T)) # rows which are same

but first row is not considered same, therefore I add also the first row

sel-c(rowSums(embed(sel2,2)),0)

and here I select only rows which are unique and do not have any NA
mat[(sel1*sel)==0,]

Which is not exactly what you want as one of rows starting  328 shall be 
included. So there has to be another trick but I can not come to any.

Regards
Petr

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Tonja Krueger
 Sent: Wednesday, August 22, 2012 10:16 AM
 To: r-help@r-project.org
 Subject: [R] Remove similar rows from matrix
 
 
Hi everybody,
 
I have a matrix (mat) from which I want to remove all rows that
 differ from
other rows in that matrix only by having one ore two NA’s instead of
 a
numbers.
 
I would like to remove rows with more NA’s preferably, so in the end
 the
matrix would look like mat2.
 
Has someone done something similar before? Thanks for helping, Tonja
 
 
Here my example:
 
ex - c(14, 56, 114, 132, 187, 279, 324, 328, 328, 338, 338, 338,
 346, 346,
395, 398, 428, 428, 428, 452, 452, 452, NA, 466, 467, 525, 894, 923,
 968,
980, 1030, 1117, 1156, NA, 1159, 1166, 1166, 1166, 1171, 1171, 1209,
 1211,
1235, 1235, 1235, 1275, 1275, 1275, NA, 1291, 1292, 1378, 829, 851,
 880,
893, 929, 1003, 1042, 1045, 1045, 1051, 1051, 1051, 1057, 1057,
 1097, 1099,
1119, 1119, 1119, 1147, 1147, 1147, 1147, 1167, 1168, 1235, 494,
 510, 533,
538, 567, 623, 657, 660, 660, 666, 666, 666, 671, 671, 699, 702, NA,
 722,
722, NA, NA, 744, 744, 759, 760, 816, 276, 293, 312, 318, 338, NA,
 NA, 418,
418, 424, 424, NA, 429, 429, NA, NA, 468, 468, 468, 490, 490, 490,
 490, 508,
509, 568, 674, 696, 726, 734, 774, 851, 893, 896, 896, 903, 903,
 903, 908,
908, 944, 947, 966, 966, 966, NA, 998, 998, 998, 1014, 1015, 1091,
 421, 446,
472, 490, 510, 582, 624, 627, 627, 633, 633, NA, 640, 640, 669, 671,
 685,
685, 685, 716, 716, 716, 716, 736, 737, 798, NA, NA, NA, NA, NA, NA,
 74, NA,
NA, 82, NA, 82, 86, NA, 104, NA, 114, NA, 114, 119, 119, 119, 119,
 NA, NA,
NA)
 
mat - matrix(example, ncol=8)
 
 
ex2 - c(14, 56, 114, 132, 187, 279, 324, 328, 338, 346, 395, 398,
 428, 452,
466, 467, 525, 894, 923, 968, 980, 1030, 1117, 156, 1159, 1166,
 1171, 1209,
1211, 1235, 1275, 1291, 1292, 1378, 829, 851, 880, 893, 929, 1003,
 1042,
1045, 1051, 1057, 1097, 1099, 1119, 1147, 1167, 1168, 1235, 494,
 510, 533,
538, 567, 623, 657, 660, 666, 671, 699, 702, 722, 744, 759, 760,
 816, 276,
293, 312, 318, 338, NA, NA, 418, 424, 429, NA, NA, 468, 490, 508,
 509, 568,
674, 696, 726, 734, 774, 851, 893, 896, 903, 908, 944, 947, 966,
 998, 1014,
1015, 1091, 421, 446, 472, 490, 510, 582, 624, 627, 633, 640, 669,
 671, 685,
716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA, 82, 86, 104, NA,
 114,
119, NA, NA, NA)
 
mat2 - matrix(example2, ncol=8)
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove similar rows from matrix

2012-08-23 Thread Rui Barradas

Hello,

Here's another close solution.


na_count - rowSums(is.na(mat))
mat1 - mat[na_count = 2, ]
diff_mat1 - rbind( mat1[1, ], apply(mat1, 2, diff) )
no - is.na(diff_mat1) | diff_mat1 == 0
yes - !apply(no, 1, all)
mat1.1 - mat1[yes, ]

all.equal( mat1.1, mat2 )  # Not quite

why1 - 1*(is.na(mat1.1)  is.na(mat2))
why2 - 1*(is.na(mat1.1) | is.na(mat2))
sum(why1); sum(why2)

why2 - why1

Why: In a sequence of equal rows, the first is allways kept, even if 
it has an NA where the others don't.
So maybe now the op could use a similar method, but starting from below, 
and then, from both solutions, keep the rows with less NAs.

I'll give it some thought latter.

Hope this helps,

Rui Barradas

Em 23-08-2012 13:09, PIKAL Petr escreveu:

Hi

I cannot reproduce exactly what you want but maybe you can elaborate this to 
suit your needs.

sel1-rowSums(is.na(mat)) # number of NA values
sel2-c(0,rowSums(apply(mat,2,diff)==0, na.rm=T)) # rows which are same

but first row is not considered same, therefore I add also the first row

sel-c(rowSums(embed(sel2,2)),0)

and here I select only rows which are unique and do not have any NA
mat[(sel1*sel)==0,]

Which is not exactly what you want as one of rows starting  328 shall be 
included. So there has to be another trick but I can not come to any.

Regards
Petr


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of Tonja Krueger
Sent: Wednesday, August 22, 2012 10:16 AM
To: r-help@r-project.org
Subject: [R] Remove similar rows from matrix


Hi everybody,

I have a matrix (mat) from which I want to remove all rows that
differ from
other rows in that matrix only by having one ore two NA’s instead of
a
numbers.

I would like to remove rows with more NA’s preferably, so in the end
the
matrix would look like mat2.

Has someone done something similar before? Thanks for helping, Tonja


Here my example:

ex - c(14, 56, 114, 132, 187, 279, 324, 328, 328, 338, 338, 338,
346, 346,
395, 398, 428, 428, 428, 452, 452, 452, NA, 466, 467, 525, 894, 923,
968,
980, 1030, 1117, 1156, NA, 1159, 1166, 1166, 1166, 1171, 1171, 1209,
1211,
1235, 1235, 1235, 1275, 1275, 1275, NA, 1291, 1292, 1378, 829, 851,
880,
893, 929, 1003, 1042, 1045, 1045, 1051, 1051, 1051, 1057, 1057,
1097, 1099,
1119, 1119, 1119, 1147, 1147, 1147, 1147, 1167, 1168, 1235, 494,
510, 533,
538, 567, 623, 657, 660, 660, 666, 666, 666, 671, 671, 699, 702, NA,
722,
722, NA, NA, 744, 744, 759, 760, 816, 276, 293, 312, 318, 338, NA,
NA, 418,
418, 424, 424, NA, 429, 429, NA, NA, 468, 468, 468, 490, 490, 490,
490, 508,
509, 568, 674, 696, 726, 734, 774, 851, 893, 896, 896, 903, 903,
903, 908,
908, 944, 947, 966, 966, 966, NA, 998, 998, 998, 1014, 1015, 1091,
421, 446,
472, 490, 510, 582, 624, 627, 627, 633, 633, NA, 640, 640, 669, 671,
685,
685, 685, 716, 716, 716, 716, 736, 737, 798, NA, NA, NA, NA, NA, NA,
74, NA,
NA, 82, NA, 82, 86, NA, 104, NA, 114, NA, 114, 119, 119, 119, 119,
NA, NA,
NA)

mat - matrix(example, ncol=8)


ex2 - c(14, 56, 114, 132, 187, 279, 324, 328, 338, 346, 395, 398,
428, 452,
466, 467, 525, 894, 923, 968, 980, 1030, 1117, 156, 1159, 1166,
1171, 1209,
1211, 1235, 1275, 1291, 1292, 1378, 829, 851, 880, 893, 929, 1003,
1042,
1045, 1051, 1057, 1097, 1099, 1119, 1147, 1167, 1168, 1235, 494,
510, 533,
538, 567, 623, 657, 660, 666, 671, 699, 702, 722, 744, 759, 760,
816, 276,
293, 312, 318, 338, NA, NA, 418, 424, 429, NA, NA, 468, 490, 508,
509, 568,
674, 696, 726, 734, 774, 851, 893, 896, 903, 908, 944, 947, 966,
998, 1014,
1015, 1091, 421, 446, 472, 490, 510, 582, 624, 627, 633, 640, 669,
671, 685,
716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA, 82, 86, 104, NA,
114,
119, NA, NA, NA)

mat2 - matrix(example2, ncol=8)
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Remove similar rows from matrix

2012-08-22 Thread Tonja Krueger

   Hi everybody,

   I have a matrix (mat) from which I want to remove all rows that differ from
   other rows in that matrix only by having one ore two NA’s instead of a
   numbers.

   I would like to remove rows with more NA’s preferably, so in the end the
   matrix would look like mat2.

   Has someone done something similar before? Thanks for helping, Tonja


   Here my example:

   ex - c(14, 56, 114, 132, 187, 279, 324, 328, 328, 338, 338, 338, 346, 346,
   395, 398, 428, 428, 428, 452, 452, 452, NA, 466, 467, 525, 894, 923, 968,
   980, 1030, 1117, 1156, NA, 1159, 1166, 1166, 1166, 1171, 1171, 1209, 1211,
   1235, 1235, 1235, 1275, 1275, 1275, NA, 1291, 1292, 1378, 829, 851, 880,
   893, 929, 1003, 1042, 1045, 1045, 1051, 1051, 1051, 1057, 1057, 1097, 1099,
   1119, 1119, 1119, 1147, 1147, 1147, 1147, 1167, 1168, 1235, 494, 510, 533,
   538, 567, 623, 657, 660, 660, 666, 666, 666, 671, 671, 699, 702, NA, 722,
   722, NA, NA, 744, 744, 759, 760, 816, 276, 293, 312, 318, 338, NA, NA, 418,
   418, 424, 424, NA, 429, 429, NA, NA, 468, 468, 468, 490, 490, 490, 490, 508,
   509, 568, 674, 696, 726, 734, 774, 851, 893, 896, 896, 903, 903, 903, 908,
   908, 944, 947, 966, 966, 966, NA, 998, 998, 998, 1014, 1015, 1091, 421, 446,
   472, 490, 510, 582, 624, 627, 627, 633, 633, NA, 640, 640, 669, 671, 685,
   685, 685, 716, 716, 716, 716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA,
   NA, 82, NA, 82, 86, NA, 104, NA, 114, NA, 114, 119, 119, 119, 119, NA, NA,
   NA)

   mat - matrix(example, ncol=8)


   ex2 - c(14, 56, 114, 132, 187, 279, 324, 328, 338, 346, 395, 398, 428, 452,
   466, 467, 525, 894, 923, 968, 980, 1030, 1117, 156, 1159, 1166, 1171, 1209,
   1211, 1235, 1275, 1291, 1292, 1378, 829, 851, 880, 893, 929, 1003, 1042,
   1045, 1051, 1057, 1097, 1099, 1119, 1147, 1167, 1168, 1235, 494, 510, 533,
   538, 567, 623, 657, 660, 666, 671, 699, 702, 722, 744, 759, 760, 816, 276,
   293, 312, 318, 338, NA, NA, 418, 424, 429, NA, NA, 468, 490, 508, 509, 568,
   674, 696, 726, 734, 774, 851, 893, 896, 903, 908, 944, 947, 966, 998, 1014,
   1015, 1091, 421, 446, 472, 490, 510, 582, 624, 627, 633, 640, 669, 671, 685,
   716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA, 82, 86, 104, NA, 114,
   119, NA, NA, NA)

   mat2 - matrix(example2, ncol=8)
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.