Hi again,

Your problem as you formulated it is not clearly defined.
For example, what do you want to do with this matrix:

  > x <- matrix(c(1, NA, 3, NA, 2, 3), ncol=3, byrow=TRUE)
  > x
       [,1] [,2] [,3]
  [1,]    1   NA    3
  [2,]   NA    2    3

Remove row 1, row 2 or nothing?

Maybe you want to proceed in 2 steps:
  (1) remove strict duplicated rows
  (2) remove rows with at least 1 NA that match a row with no NAs

In this case you would not remove any row from x.

The removeLooseDupRows() function below does (2) only. If you
want (1) and (2), you need to combine it with unique() by doing
either removeLooseDupRows(unique(x)) or unique(removeLooseDupRows(x))
(both should always give the same result).

removeLooseDupRows <- function(x)
{
    if (nrow(x) <= 1)
        return(x)
    ii <- do.call("order",
                  args=lapply(seq_len(ncol(x)),
                              function(col) x[ , col]))
    dup_index <- logical(nrow(x))
    i0 <- -1
    for (k in 1:length(ii)) {
        i <- ii[k]
        if (any(is.na(x[i, ]))) {
            if (i0 == -1)
                next
            if (any(x[i, ] != x[i0, ], na.rm=TRUE))
                next
            dup_index[i] <- TRUE
        } else {
            i0 <- i
        }
    }
    x[!dup_index, ]
}

  > x <- matrix((1:3), 5, 3)
  > x[4,2] = NA
  > x[3,3] = NA
  > x
       [,1] [,2] [,3]
  [1,]    1    3    2
  [2,]    2    1    3
  [3,]    3    2   NA
  [4,]    1   NA    2
  [5,]    2    1    3

  > removeLooseDupRows(x)
       [,1] [,2] [,3]
  [1,]    1    3    2
  [2,]    2    1    3
  [3,]    3    2   NA
  [4,]    2    1    3

  > removeLooseDupRows(unique(x))
       [,1] [,2] [,3]
  [1,]    1    3    2
  [2,]    2    1    3
  [3,]    3    2   NA


Cheers,
H.


Quoting [EMAIL PROTECTED]:

> Quoting Petr Pikal <[EMAIL PROTECTED]>:
> 
> > Hi
> > 
> > its a bit tricky but
> > 
> > dup<-apply(x, 2, duplicated) #which are dupplucated
> > isna<-apply(x, 2, is.na) #which are na
> > check<-dup|isna # which are both
> > 
> > and here is your result
> > 
> > x[rowSums(check)!=3,]
> >      [,1] [,2] [,3]
> > [1,]    1    3    2
> > [2,]    2    1    3
> > [3,]    3    2   NA
> 
> Hi,
> 
> The above doesn't work. No need to have NAs in x:
> 
>   > x <- matrix(c(2,2,1,3,2,3), ncol=2, byrow=TRUE)
>   > x
>        [,1] [,2]
>   [1,]    2    2
>   [2,]    1    3
>   [3,]    2    3
> 
>   > dup <- apply(x, 2, duplicated)
>   > x[rowSums(check)!=2 ,]
>        [,1] [,2]
>   [1,]    2    2
>   [2,]    1    3
> 
> Look at 'dup':
> 
>   > dup
>         [,1]  [,2]
>   [1,] FALSE FALSE
>   [2,] FALSE FALSE
>   [3,]  TRUE  TRUE
> 
> Yes, each element in the last row is a duplicate in its own col,
> but this doesn't mean that the row as a whole is a duplicate.
> 
> Cheers,
> H.
> 
> 
> > 
> > 
> > Regards
> > Petr
> > 
> > 
> > 
> > 
> > On 8 Mar 2007 at 10:14, stacey thompson wrote:
> > 
> > Date sent:          Thu, 8 Mar 2007 10:14:37 -0500
> > From:               "stacey thompson" <[EMAIL PROTECTED]>
> > To:                 r-help@stat.math.ethz.ch
> > Subject:            [R] Removing duplicated rows within a matrix,
> >     with missing data as wildcards
> > 
> > > I'd like to remove duplicated rows within a matrix, with missing data
> > > being treated as wildcards.
> > > 
> > > For example
> > > 
> > > > x <- matrix((1:3), 5, 3)
> > > > x[4,2] = NA
> > > > x[3,3] = NA
> > > > x
> > > 
> > >      [,1] [,2] [,3]
> > > [1,]    1    3    2
> > > [2,]    2    1    3
> > > [3,]    3    2   NA
> > > [4,]    1   NA    2
> > > [5,]    2    1    3
> > > 
> > > I would like to obtain
> > > 
> > >       [,1] [,2] [,3]
> > > [1,]    1    3    2
> > > [2,]    2    1    3
> > > [3,]    3    2   NA
> > > 
> > > >From the R-help archives, I learned about unique(x) and
> > > >duplicated(x).
> > > However, unique(x) returns
> > > 
> > > > unique(x)
> > > 
> > >      [,1] [,2] [,3]
> > > [1,]    1    3    2
> > > [2,]    2    1    3
> > > [3,]    3    2   NA
> > > [4,]    1   NA    2
> > > 
> > > and duplicated(x) gives
> > > 
> > > > duplicated(x)
> > > 
> > > [1] FALSE FALSE FALSE FALSE  TRUE
> > > 
> > > I have tried various na.action 's but with unique(x) I get errors at
> > > best.
> > > 
> > > e.g.
> > > > unique(x, na.omit(x))
> > > 
> > > Error: argument 'incomparables != FALSE' is not used (yet)
> > > 
> > > How I might tackle this?
> > > 
> > > Thanks,
> > > 
> > > -stacey
> > > 
> > > -- 
> > > -stacey lee thompson-
> > > Stagiaire post-doctorale
> > > Institut de recherche en biologie végétale
> > > Université de Montréal
> > > 4101 Sherbrooke Est
> > > Montréal, Québec H1X 2B2 Canada
> > > [EMAIL PROTECTED]
> > > 
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html and provide commented,
> > > minimal, self-contained, reproducible code.
> > 
> > Petr Pikal
> > [EMAIL PROTECTED]
> > 
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to