Re: [R] unqiue problem

Assa Yeroslaviz Mon, 14 Jun 2010 09:33:35 -0700

I thought unique delete the whole line.
I don't really need the row names, but I thought of it as a way of getting
the unique items.


Is there a way of deleting whole lines completely according to their
identifiers?

What I really need are unique values on the first column.

Assa

On Mon, Jun 14, 2010 at 18:04, jim holtman <jholt...@gmail.com> wrote:

> Your process does remove all the duplicate entries based on the
> content of the two columns.  After you do this, there are still
> duplicate entries in the first column that you are trying to use as
> rownames and therefore the error.  Why to you want to use non-unique
> entries as rownames?  Do you really need the row names, or should you
> only be keeping unique values for the first column?
>
> On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz <fry...@gmail.com> wrote:
> > Hello everybody,
> >
> > I have a a matrix of 2 columns and over 27k rows.
> > some of the rows are double , so I tried to remove them with the command
> > unique():
> >
> >> Workbook5 <- read.delim(file =  "Workbook5.txt")
> >> dim(Workbook5)
> > [1] 27748     2
> >> Workbook5 <- unique(Workbook5)
> >> dim(Workbook5)
> > [1] 20101     2
> >
> > it removed a lot of line, but unfortunately not all of them. I wanted to
> add
> > the row names to the matrix and got this error message:
> >> rownames(Workbook5) <- Workbook5[,1]
> > Error in `row.names<-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L,
>  :
> >  duplicate 'row.names' are not allowed
> > In addition: Warning message:
> > non-unique values when setting 'row.names': âÃÃ²A_51_P102339âÃÃ´,
> > âÃÃ²A_51_P102518âÃÃ´, âÃÃ²A_51_P103435âÃÃ´, 
> > âÃÃ²A_51_P103465âÃÃ´,
> > âÃÃ²A_51_P103594âÃÃ´, âÃÃ²A_51_P104409âÃÃ´, 
> > âÃÃ²A_51_P104718âÃÃ´,
> > âÃÃ²A_51_P105869âÃÃ´, âÃÃ²A_51_P106428âÃÃ´, 
> > âÃÃ²A_51_P106799âÃÃ´,
> > âÃÃ²A_51_P107176âÃÃ´, âÃÃ²A_51_P107959âÃÃ´, 
> > âÃÃ²A_51_P108767âÃÃ´,
> > âÃÃ²A_51_P109258âÃÃ´, âÃÃ²A_51_P109708âÃÃ´, 
> > âÃÃ²A_51_P110341âÃÃ´,
> > âÃÃ²A_51_P111757âÃÃ´, âÃÃ²A_51_P112427âÃÃ´, 
> > âÃÃ²A_51_P112662âÃÃ´,
> > âÃÃ²A_51_P113672âÃÃ´, âÃÃ²A_51_P115018âÃÃ´, 
> > âÃÃ²A_51_P116496âÃÃ´,
> > âÃÃ²A_51_P116636âÃÃ´, âÃÃ²A_51_P117666âÃÃ´, 
> > âÃÃ²A_51_P118132âÃÃ´,
> > âÃÃ²A_51_P118168âÃÃ´, âÃÃ²A_51_P118400âÃÃ´, 
> > âÃÃ²A_51_P118506âÃÃ´,
> > âÃÃ²A_51_P119315âÃÃ´, âÃÃ²A_51_P120093âÃÃ´, 
> > âÃÃ²A_51_P120305âÃÃ´,
> > âÃÃ²A_51_P120738âÃÃ´, âÃÃ²A_51_P120785âÃÃ´, 
> > âÃÃ²A_51_P121134âÃÃ´,
> > âÃÃ²A_51_P121359âÃÃ´, âÃÃ²A_51_P121412âÃÃ´, 
> > âÃÃ²A_51_P121652âÃÃ´,
> > âÃÃ²A_51_P121724âÃÃ´, âÃÃ²A_51_P121829âÃÃ´, 
> > âÃÃ²A_51_P122141âÃÃ´,
> > âÃÃ²A_51_P122964âÃÃ´, âÃÃ²A_51_P123422âÃÃ´, 
> > âÃÃ²A_51_P123895âÃÃ´,
> > âÃÃ²A_51_P124008âÃÃ´, âÃÃ²A_51_P124719âÃÃ´, 
> > âÃÃ²A_51_P125648âÃÃ´,
> > âÃÃ²A_51_P125679âÃÃ´, âÃÃ²A_51_P125779â [... truncated]
> >
> > Is there a better way to discard the duplicataions in the text file
> (Excel
> > file is the origin).
> >
> >> R.version
> >               _
> > platform       x86_64-apple-darwin9.8.0
> > arch           x86_64
> > os             darwin9.8.0
> > system         x86_64, darwin9.8.0
> > status         Patched
> > major          2
> > minor          11.1
> > year           2010
> > month          06
> > day            03
> > svn rev        52201
> > language       R
> > version.string R version 2.11.1 Patched (2010-06-03 r52201)
> >
> > THX
> >
> > Assa
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unqiue problem

Reply via email to