Re: [R] Remove similar rows from matrix
Hi I cannot reproduce exactly what you want but maybe you can elaborate this to suit your needs. sel1-rowSums(is.na(mat)) # number of NA values sel2-c(0,rowSums(apply(mat,2,diff)==0, na.rm=T)) # rows which are same but first row is not considered same, therefore I add also the first row sel-c(rowSums(embed(sel2,2)),0) and here I select only rows which are unique and do not have any NA mat[(sel1*sel)==0,] Which is not exactly what you want as one of rows starting 328 shall be included. So there has to be another trick but I can not come to any. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Tonja Krueger Sent: Wednesday, August 22, 2012 10:16 AM To: r-help@r-project.org Subject: [R] Remove similar rows from matrix Hi everybody, I have a matrix (mat) from which I want to remove all rows that differ from other rows in that matrix only by having one ore two NA’s instead of a numbers. I would like to remove rows with more NA’s preferably, so in the end the matrix would look like mat2. Has someone done something similar before? Thanks for helping, Tonja Here my example: ex - c(14, 56, 114, 132, 187, 279, 324, 328, 328, 338, 338, 338, 346, 346, 395, 398, 428, 428, 428, 452, 452, 452, NA, 466, 467, 525, 894, 923, 968, 980, 1030, 1117, 1156, NA, 1159, 1166, 1166, 1166, 1171, 1171, 1209, 1211, 1235, 1235, 1235, 1275, 1275, 1275, NA, 1291, 1292, 1378, 829, 851, 880, 893, 929, 1003, 1042, 1045, 1045, 1051, 1051, 1051, 1057, 1057, 1097, 1099, 1119, 1119, 1119, 1147, 1147, 1147, 1147, 1167, 1168, 1235, 494, 510, 533, 538, 567, 623, 657, 660, 660, 666, 666, 666, 671, 671, 699, 702, NA, 722, 722, NA, NA, 744, 744, 759, 760, 816, 276, 293, 312, 318, 338, NA, NA, 418, 418, 424, 424, NA, 429, 429, NA, NA, 468, 468, 468, 490, 490, 490, 490, 508, 509, 568, 674, 696, 726, 734, 774, 851, 893, 896, 896, 903, 903, 903, 908, 908, 944, 947, 966, 966, 966, NA, 998, 998, 998, 1014, 1015, 1091, 421, 446, 472, 490, 510, 582, 624, 627, 627, 633, 633, NA, 640, 640, 669, 671, 685, 685, 685, 716, 716, 716, 716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA, NA, 82, NA, 82, 86, NA, 104, NA, 114, NA, 114, 119, 119, 119, 119, NA, NA, NA) mat - matrix(example, ncol=8) ex2 - c(14, 56, 114, 132, 187, 279, 324, 328, 338, 346, 395, 398, 428, 452, 466, 467, 525, 894, 923, 968, 980, 1030, 1117, 156, 1159, 1166, 1171, 1209, 1211, 1235, 1275, 1291, 1292, 1378, 829, 851, 880, 893, 929, 1003, 1042, 1045, 1051, 1057, 1097, 1099, 1119, 1147, 1167, 1168, 1235, 494, 510, 533, 538, 567, 623, 657, 660, 666, 671, 699, 702, 722, 744, 759, 760, 816, 276, 293, 312, 318, 338, NA, NA, 418, 424, 429, NA, NA, 468, 490, 508, 509, 568, 674, 696, 726, 734, 774, 851, 893, 896, 903, 908, 944, 947, 966, 998, 1014, 1015, 1091, 421, 446, 472, 490, 510, 582, 624, 627, 633, 640, 669, 671, 685, 716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA, 82, 86, 104, NA, 114, 119, NA, NA, NA) mat2 - matrix(example2, ncol=8) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove similar rows from matrix
Hello, Here's another close solution. na_count - rowSums(is.na(mat)) mat1 - mat[na_count = 2, ] diff_mat1 - rbind( mat1[1, ], apply(mat1, 2, diff) ) no - is.na(diff_mat1) | diff_mat1 == 0 yes - !apply(no, 1, all) mat1.1 - mat1[yes, ] all.equal( mat1.1, mat2 ) # Not quite why1 - 1*(is.na(mat1.1) is.na(mat2)) why2 - 1*(is.na(mat1.1) | is.na(mat2)) sum(why1); sum(why2) why2 - why1 Why: In a sequence of equal rows, the first is allways kept, even if it has an NA where the others don't. So maybe now the op could use a similar method, but starting from below, and then, from both solutions, keep the rows with less NAs. I'll give it some thought latter. Hope this helps, Rui Barradas Em 23-08-2012 13:09, PIKAL Petr escreveu: Hi I cannot reproduce exactly what you want but maybe you can elaborate this to suit your needs. sel1-rowSums(is.na(mat)) # number of NA values sel2-c(0,rowSums(apply(mat,2,diff)==0, na.rm=T)) # rows which are same but first row is not considered same, therefore I add also the first row sel-c(rowSums(embed(sel2,2)),0) and here I select only rows which are unique and do not have any NA mat[(sel1*sel)==0,] Which is not exactly what you want as one of rows starting 328 shall be included. So there has to be another trick but I can not come to any. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Tonja Krueger Sent: Wednesday, August 22, 2012 10:16 AM To: r-help@r-project.org Subject: [R] Remove similar rows from matrix Hi everybody, I have a matrix (mat) from which I want to remove all rows that differ from other rows in that matrix only by having one ore two NA’s instead of a numbers. I would like to remove rows with more NA’s preferably, so in the end the matrix would look like mat2. Has someone done something similar before? Thanks for helping, Tonja Here my example: ex - c(14, 56, 114, 132, 187, 279, 324, 328, 328, 338, 338, 338, 346, 346, 395, 398, 428, 428, 428, 452, 452, 452, NA, 466, 467, 525, 894, 923, 968, 980, 1030, 1117, 1156, NA, 1159, 1166, 1166, 1166, 1171, 1171, 1209, 1211, 1235, 1235, 1235, 1275, 1275, 1275, NA, 1291, 1292, 1378, 829, 851, 880, 893, 929, 1003, 1042, 1045, 1045, 1051, 1051, 1051, 1057, 1057, 1097, 1099, 1119, 1119, 1119, 1147, 1147, 1147, 1147, 1167, 1168, 1235, 494, 510, 533, 538, 567, 623, 657, 660, 660, 666, 666, 666, 671, 671, 699, 702, NA, 722, 722, NA, NA, 744, 744, 759, 760, 816, 276, 293, 312, 318, 338, NA, NA, 418, 418, 424, 424, NA, 429, 429, NA, NA, 468, 468, 468, 490, 490, 490, 490, 508, 509, 568, 674, 696, 726, 734, 774, 851, 893, 896, 896, 903, 903, 903, 908, 908, 944, 947, 966, 966, 966, NA, 998, 998, 998, 1014, 1015, 1091, 421, 446, 472, 490, 510, 582, 624, 627, 627, 633, 633, NA, 640, 640, 669, 671, 685, 685, 685, 716, 716, 716, 716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA, NA, 82, NA, 82, 86, NA, 104, NA, 114, NA, 114, 119, 119, 119, 119, NA, NA, NA) mat - matrix(example, ncol=8) ex2 - c(14, 56, 114, 132, 187, 279, 324, 328, 338, 346, 395, 398, 428, 452, 466, 467, 525, 894, 923, 968, 980, 1030, 1117, 156, 1159, 1166, 1171, 1209, 1211, 1235, 1275, 1291, 1292, 1378, 829, 851, 880, 893, 929, 1003, 1042, 1045, 1051, 1057, 1097, 1099, 1119, 1147, 1167, 1168, 1235, 494, 510, 533, 538, 567, 623, 657, 660, 666, 671, 699, 702, 722, 744, 759, 760, 816, 276, 293, 312, 318, 338, NA, NA, 418, 424, 429, NA, NA, 468, 490, 508, 509, 568, 674, 696, 726, 734, 774, 851, 893, 896, 903, 908, 944, 947, 966, 998, 1014, 1015, 1091, 421, 446, 472, 490, 510, 582, 624, 627, 633, 640, 669, 671, 685, 716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA, 82, 86, 104, NA, 114, 119, NA, NA, NA) mat2 - matrix(example2, ncol=8) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Remove similar rows from matrix
Hi everybody, I have a matrix (mat) from which I want to remove all rows that differ from other rows in that matrix only by having one ore two NA’s instead of a numbers. I would like to remove rows with more NA’s preferably, so in the end the matrix would look like mat2. Has someone done something similar before? Thanks for helping, Tonja Here my example: ex - c(14, 56, 114, 132, 187, 279, 324, 328, 328, 338, 338, 338, 346, 346, 395, 398, 428, 428, 428, 452, 452, 452, NA, 466, 467, 525, 894, 923, 968, 980, 1030, 1117, 1156, NA, 1159, 1166, 1166, 1166, 1171, 1171, 1209, 1211, 1235, 1235, 1235, 1275, 1275, 1275, NA, 1291, 1292, 1378, 829, 851, 880, 893, 929, 1003, 1042, 1045, 1045, 1051, 1051, 1051, 1057, 1057, 1097, 1099, 1119, 1119, 1119, 1147, 1147, 1147, 1147, 1167, 1168, 1235, 494, 510, 533, 538, 567, 623, 657, 660, 660, 666, 666, 666, 671, 671, 699, 702, NA, 722, 722, NA, NA, 744, 744, 759, 760, 816, 276, 293, 312, 318, 338, NA, NA, 418, 418, 424, 424, NA, 429, 429, NA, NA, 468, 468, 468, 490, 490, 490, 490, 508, 509, 568, 674, 696, 726, 734, 774, 851, 893, 896, 896, 903, 903, 903, 908, 908, 944, 947, 966, 966, 966, NA, 998, 998, 998, 1014, 1015, 1091, 421, 446, 472, 490, 510, 582, 624, 627, 627, 633, 633, NA, 640, 640, 669, 671, 685, 685, 685, 716, 716, 716, 716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA, NA, 82, NA, 82, 86, NA, 104, NA, 114, NA, 114, 119, 119, 119, 119, NA, NA, NA) mat - matrix(example, ncol=8) ex2 - c(14, 56, 114, 132, 187, 279, 324, 328, 338, 346, 395, 398, 428, 452, 466, 467, 525, 894, 923, 968, 980, 1030, 1117, 156, 1159, 1166, 1171, 1209, 1211, 1235, 1275, 1291, 1292, 1378, 829, 851, 880, 893, 929, 1003, 1042, 1045, 1051, 1057, 1097, 1099, 1119, 1147, 1167, 1168, 1235, 494, 510, 533, 538, 567, 623, 657, 660, 666, 671, 699, 702, 722, 744, 759, 760, 816, 276, 293, 312, 318, 338, NA, NA, 418, 424, 429, NA, NA, 468, 490, 508, 509, 568, 674, 696, 726, 734, 774, 851, 893, 896, 903, 908, 944, 947, 966, 998, 1014, 1015, 1091, 421, 446, 472, 490, 510, 582, 624, 627, 633, 640, 669, 671, 685, 716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA, 82, 86, 104, NA, 114, 119, NA, NA, NA) mat2 - matrix(example2, ncol=8) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.