Hello,

Here's another close solution.


na_count <- rowSums(is.na(mat))
mat1 <- mat[na_count <= 2, ]
diff_mat1 <- rbind( mat1[1, ], apply(mat1, 2, diff) )
no <- is.na(diff_mat1) | diff_mat1 == 0
yes <- !apply(no, 1, all)
mat1.1 <- mat1[yes, ]

all.equal( mat1.1, mat2 )  # Not quite

why1 <- 1*(is.na(mat1.1) & is.na(mat2))
why2 <- 1*(is.na(mat1.1) | is.na(mat2))
sum(why1); sum(why2)

why2 - why1

Why: In a sequence of "equal" rows, the first is allways kept, even if it has an NA where the others don't. So maybe now the op could use a similar method, but starting from below, and then, from both solutions, keep the rows with less NAs.
I'll give it some thought latter.

Hope this helps,

Rui Barradas

Em 23-08-2012 13:09, PIKAL Petr escreveu:
Hi

I cannot reproduce exactly what you want but maybe you can elaborate this to 
suit your needs.

sel1<-rowSums(is.na(mat)) # number of NA values
sel2<-c(0,rowSums(apply(mat,2,diff)==0, na.rm=T)) # rows which are same

but first row is not considered same, therefore I add also the first row

sel<-c(rowSums(embed(sel2,2)),0)

and here I select only rows which are unique and do not have any NA
mat[(sel1*sel)==0,]

Which is not exactly what you want as one of rows starting  328 shall be 
included. So there has to be another trick but I can not come to any.

Regards
Petr

-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of Tonja Krueger
Sent: Wednesday, August 22, 2012 10:16 AM
To: r-help@r-project.org
Subject: [R] Remove similar rows from matrix


    Hi everybody,

    I have a matrix (mat) from which I want to remove all rows that
differ from
    other rows in that matrix only by having one ore two NA’s instead of
a
    numbers.

    I would like to remove rows with more NA’s preferably, so in the end
the
    matrix would look like mat2.

    Has someone done something similar before? Thanks for helping, Tonja


    Here my example:

    ex <- c(14, 56, 114, 132, 187, 279, 324, 328, 328, 338, 338, 338,
346, 346,
    395, 398, 428, 428, 428, 452, 452, 452, NA, 466, 467, 525, 894, 923,
968,
    980, 1030, 1117, 1156, NA, 1159, 1166, 1166, 1166, 1171, 1171, 1209,
1211,
    1235, 1235, 1235, 1275, 1275, 1275, NA, 1291, 1292, 1378, 829, 851,
880,
    893, 929, 1003, 1042, 1045, 1045, 1051, 1051, 1051, 1057, 1057,
1097, 1099,
    1119, 1119, 1119, 1147, 1147, 1147, 1147, 1167, 1168, 1235, 494,
510, 533,
    538, 567, 623, 657, 660, 660, 666, 666, 666, 671, 671, 699, 702, NA,
722,
    722, NA, NA, 744, 744, 759, 760, 816, 276, 293, 312, 318, 338, NA,
NA, 418,
    418, 424, 424, NA, 429, 429, NA, NA, 468, 468, 468, 490, 490, 490,
490, 508,
    509, 568, 674, 696, 726, 734, 774, 851, 893, 896, 896, 903, 903,
903, 908,
    908, 944, 947, 966, 966, 966, NA, 998, 998, 998, 1014, 1015, 1091,
421, 446,
    472, 490, 510, 582, 624, 627, 627, 633, 633, NA, 640, 640, 669, 671,
685,
    685, 685, 716, 716, 716, 716, 736, 737, 798, NA, NA, NA, NA, NA, NA,
74, NA,
    NA, 82, NA, 82, 86, NA, 104, NA, 114, NA, 114, 119, 119, 119, 119,
NA, NA,
    NA)

    mat <- matrix(example, ncol=8)


    ex2 <- c(14, 56, 114, 132, 187, 279, 324, 328, 338, 346, 395, 398,
428, 452,
    466, 467, 525, 894, 923, 968, 980, 1030, 1117, 156, 1159, 1166,
1171, 1209,
    1211, 1235, 1275, 1291, 1292, 1378, 829, 851, 880, 893, 929, 1003,
1042,
    1045, 1051, 1057, 1097, 1099, 1119, 1147, 1167, 1168, 1235, 494,
510, 533,
    538, 567, 623, 657, 660, 666, 671, 699, 702, 722, 744, 759, 760,
816, 276,
    293, 312, 318, 338, NA, NA, 418, 424, 429, NA, NA, 468, 490, 508,
509, 568,
    674, 696, 726, 734, 774, 851, 893, 896, 903, 908, 944, 947, 966,
998, 1014,
    1015, 1091, 421, 446, 472, 490, 510, 582, 624, 627, 633, 640, 669,
671, 685,
    716, 736, 737, 798, NA, NA, NA, NA, NA, NA, 74, NA, 82, 86, 104, NA,
114,
    119, NA, NA, NA)

    mat2 <- matrix(example2, ncol=8)
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to