Re: [R] unqiue problem
On Jun 14, 2010, at 1:10 PM, David Winsemius wrote: On Jun 14, 2010, at 12:32 PM, Assa Yeroslaviz wrote: I thought unique delete the whole line. I don't really need the row names, but I thought of it as a way of getting the unique items. Is there a way of deleting whole lines completely according to their identifiers? What I really need are unique values on the first column. Assa On Mon, Jun 14, 2010 at 18:04, jim holtman wrote: Your process does remove all the duplicate entries based on the content of the two columns. After you do this, there are still duplicate entries in the first column that you are trying to use as rownames and therefore the error. Why to you want to use non-unique entries as rownames? Do you really need the row names, or should you only be keeping unique values for the first column? On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz wrote: Hello everybody, I have a a matrix of 2 columns and over 27k rows. some of the rows are double , so I tried to remove them with the command unique(): Workbook5 <- read.delim(file = "Workbook5.txt") dim(Workbook5) [1] 27748 2 Workbook5 <- unique(Workbook5) Jim already showed you one way in another thread and it is probably more intuitive than this way, but just so you know... Workbook5 <- Workbook5[ unique(Workbook5[ ,1] ) , ] ... should have worked. Logical indexing on first column with return of both columns of qualifying rows. Actually I was thinking a bit askew although that would have succeeded. That was not logical indexing, which would have been done with duplicated() ... or rather its negation through the use of the "!" unary operator: > str(unique(Workbook5[ ,1] ) ) Factor w/ 17209 levels "A_51_P100034",..: 1 2 3 4 5 6 7 8 9 10 ... > str(!duplicated(Workbook5[ ,1] ) ) logi [1:20101] TRUE TRUE TRUE TRUE TRUE TRUE ... So this would have been the way to do it with logical indexing: Workbook5 <- Workbook5[ !duplicated(Workbook5[ ,1] ) , ] -- David. dim(Workbook5) [1] 20101 2 it removed a lot of line, but unfortunately not all of them. I wanted to add the row names to the matrix and got this error message: rownames(Workbook5) <- Workbook5[,1] Error in `row.names<-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L, : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‚ÄòA_51_P102339‚Äô, ‚ÄòA_51_P102518‚Äô, ‚ÄòA_51_P103435‚Äô, ‚ÄòA_51_P103465‚Äô, ‚ÄòA_51_P103594‚Äô, ‚ÄòA_51_P104409‚Äô, ‚ÄòA_51_P104718‚Äô, ‚ÄòA_51_P105869‚Äô, ‚ÄòA_51_P106428‚Äô, ‚ÄòA_51_P106799‚Äô, ‚ÄòA_51_P107176‚Äô, ‚ÄòA_51_P107959‚Äô, ‚ÄòA_51_P108767‚Äô, ‚ÄòA_51_P109258‚Äô, ‚ÄòA_51_P109708‚Äô, ‚ÄòA_51_P110341‚Äô, ‚ÄòA_51_P111757‚Äô, ‚ÄòA_51_P112427‚Äô, ‚ÄòA_51_P112662‚Äô, ‚ÄòA_51_P113672‚Äô, ‚ÄòA_51_P115018‚Äô, ‚ÄòA_51_P116496‚Äô, ‚ÄòA_51_P116636‚Äô, ‚ÄòA_51_P117666‚Äô, ‚ÄòA_51_P118132‚Äô, ‚ÄòA_51_P118168‚Äô, ‚ÄòA_51_P118400‚Äô, ‚ÄòA_51_P118506‚Äô, ‚ÄòA_51_P119315‚Äô, ‚ÄòA_51_P120093‚Äô, ‚ÄòA_51_P120305‚Äô, ‚ÄòA_51_P120738‚Äô, ‚ÄòA_51_P120785‚Äô, ‚ÄòA_51_P121134‚Äô, ‚ÄòA_51_P121359‚Äô, ‚ÄòA_51_P121412‚Äô, ‚ÄòA_51_P121652‚Äô, ‚ÄòA_51_P121724‚Äô, ‚ÄòA_51_P121829‚Äô, ‚ÄòA_51_P122141‚Äô, ‚ÄòA_51_P122964‚Äô, ‚ÄòA_51_P123422‚Äô, ‚ÄòA_51_P123895‚Äô, ‚ÄòA_51_P124008‚Äô, ‚ÄòA_51_P124719‚Äô, ‚ÄòA_51_P125648‚Äô, ‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated] Is there a better way to discard the duplicataions in the text file (Excel file is the origin). R.version _ platform x86_64-apple-darwin9.8.0 arch x86_64 os darwin9.8.0 system x86_64, darwin9.8.0 status Patched major 2 minor 11.1 year 2010 month 06 day03 svn rev52201 language R version.string R version 2.11.1 Patched (2010-06-03 r52201) THX Assa __ R-help@r-project.org mailing list David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unqiue problem
On Jun 14, 2010, at 12:32 PM, Assa Yeroslaviz wrote: I thought unique delete the whole line. I don't really need the row names, but I thought of it as a way of getting the unique items. Is there a way of deleting whole lines completely according to their identifiers? What I really need are unique values on the first column. Assa On Mon, Jun 14, 2010 at 18:04, jim holtman wrote: Your process does remove all the duplicate entries based on the content of the two columns. After you do this, there are still duplicate entries in the first column that you are trying to use as rownames and therefore the error. Why to you want to use non-unique entries as rownames? Do you really need the row names, or should you only be keeping unique values for the first column? On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz wrote: Hello everybody, I have a a matrix of 2 columns and over 27k rows. some of the rows are double , so I tried to remove them with the command unique(): Workbook5 <- read.delim(file = "Workbook5.txt") dim(Workbook5) [1] 27748 2 Workbook5 <- unique(Workbook5) Jim already showed you one way in another thread and it is probably more intuitive than this way, but just so you know... Workbook5 <- Workbook5[ unique(Workbook5[ ,1] ) , ] ... should have worked. Logical indexing on first column with return of both columns of qualifying rows. -- David. dim(Workbook5) [1] 20101 2 it removed a lot of line, but unfortunately not all of them. I wanted to add the row names to the matrix and got this error message: rownames(Workbook5) <- Workbook5[,1] Error in `row.names<-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L, : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‚ÄòA_51_P102339‚Äô, ‚ÄòA_51_P102518‚Äô, ‚ÄòA_51_P103435‚Äô, ‚ÄòA_51_P103465‚Äô, ‚ÄòA_51_P103594‚Äô, ‚ÄòA_51_P104409‚Äô, ‚ÄòA_51_P104718‚Äô, ‚ÄòA_51_P105869‚Äô, ‚ÄòA_51_P106428‚Äô, ‚ÄòA_51_P106799‚Äô, ‚ÄòA_51_P107176‚Äô, ‚ÄòA_51_P107959‚Äô, ‚ÄòA_51_P108767‚Äô, ‚ÄòA_51_P109258‚Äô, ‚ÄòA_51_P109708‚Äô, ‚ÄòA_51_P110341‚Äô, ‚ÄòA_51_P111757‚Äô, ‚ÄòA_51_P112427‚Äô, ‚ÄòA_51_P112662‚Äô, ‚ÄòA_51_P113672‚Äô, ‚ÄòA_51_P115018‚Äô, ‚ÄòA_51_P116496‚Äô, ‚ÄòA_51_P116636‚Äô, ‚ÄòA_51_P117666‚Äô, ‚ÄòA_51_P118132‚Äô, ‚ÄòA_51_P118168‚Äô, ‚ÄòA_51_P118400‚Äô, ‚ÄòA_51_P118506‚Äô, ‚ÄòA_51_P119315‚Äô, ‚ÄòA_51_P120093‚Äô, ‚ÄòA_51_P120305‚Äô, ‚ÄòA_51_P120738‚Äô, ‚ÄòA_51_P120785‚Äô, ‚ÄòA_51_P121134‚Äô, ‚ÄòA_51_P121359‚Äô, ‚ÄòA_51_P121412‚Äô, ‚ÄòA_51_P121652‚Äô, ‚ÄòA_51_P121724‚Äô, ‚ÄòA_51_P121829‚Äô, ‚ÄòA_51_P122141‚Äô, ‚ÄòA_51_P122964‚Äô, ‚ÄòA_51_P123422‚Äô, ‚ÄòA_51_P123895‚Äô, ‚ÄòA_51_P124008‚Äô, ‚ÄòA_51_P124719‚Äô, ‚ÄòA_51_P125648‚Äô, ‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated] Is there a better way to discard the duplicataions in the text file (Excel file is the origin). R.version _ platform x86_64-apple-darwin9.8.0 arch x86_64 os darwin9.8.0 system x86_64, darwin9.8.0 status Patched major 2 minor 11.1 year 2010 month 06 day03 svn rev52201 language R version.string R version 2.11.1 Patched (2010-06-03 r52201) THX Assa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unqiue problem
I thought unique delete the whole line. I don't really need the row names, but I thought of it as a way of getting the unique items. Is there a way of deleting whole lines completely according to their identifiers? What I really need are unique values on the first column. Assa On Mon, Jun 14, 2010 at 18:04, jim holtman wrote: > Your process does remove all the duplicate entries based on the > content of the two columns. After you do this, there are still > duplicate entries in the first column that you are trying to use as > rownames and therefore the error. Why to you want to use non-unique > entries as rownames? Do you really need the row names, or should you > only be keeping unique values for the first column? > > On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz wrote: > > Hello everybody, > > > > I have a a matrix of 2 columns and over 27k rows. > > some of the rows are double , so I tried to remove them with the command > > unique(): > > > >> Workbook5 <- read.delim(file = "Workbook5.txt") > >> dim(Workbook5) > > [1] 27748 2 > >> Workbook5 <- unique(Workbook5) > >> dim(Workbook5) > > [1] 20101 2 > > > > it removed a lot of line, but unfortunately not all of them. I wanted to > add > > the row names to the matrix and got this error message: > >> rownames(Workbook5) <- Workbook5[,1] > > Error in `row.names<-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L, > : > > duplicate 'row.names' are not allowed > > In addition: Warning message: > > non-unique values when setting 'row.names': âÃòA_51_P102339âÃô, > > âÃòA_51_P102518âÃô, âÃòA_51_P103435âÃô, > > âÃòA_51_P103465âÃô, > > âÃòA_51_P103594âÃô, âÃòA_51_P104409âÃô, > > âÃòA_51_P104718âÃô, > > âÃòA_51_P105869âÃô, âÃòA_51_P106428âÃô, > > âÃòA_51_P106799âÃô, > > âÃòA_51_P107176âÃô, âÃòA_51_P107959âÃô, > > âÃòA_51_P108767âÃô, > > âÃòA_51_P109258âÃô, âÃòA_51_P109708âÃô, > > âÃòA_51_P110341âÃô, > > âÃòA_51_P111757âÃô, âÃòA_51_P112427âÃô, > > âÃòA_51_P112662âÃô, > > âÃòA_51_P113672âÃô, âÃòA_51_P115018âÃô, > > âÃòA_51_P116496âÃô, > > âÃòA_51_P116636âÃô, âÃòA_51_P117666âÃô, > > âÃòA_51_P118132âÃô, > > âÃòA_51_P118168âÃô, âÃòA_51_P118400âÃô, > > âÃòA_51_P118506âÃô, > > âÃòA_51_P119315âÃô, âÃòA_51_P120093âÃô, > > âÃòA_51_P120305âÃô, > > âÃòA_51_P120738âÃô, âÃòA_51_P120785âÃô, > > âÃòA_51_P121134âÃô, > > âÃòA_51_P121359âÃô, âÃòA_51_P121412âÃô, > > âÃòA_51_P121652âÃô, > > âÃòA_51_P121724âÃô, âÃòA_51_P121829âÃô, > > âÃòA_51_P122141âÃô, > > âÃòA_51_P122964âÃô, âÃòA_51_P123422âÃô, > > âÃòA_51_P123895âÃô, > > âÃòA_51_P124008âÃô, âÃòA_51_P124719âÃô, > > âÃòA_51_P125648âÃô, > > âÃòA_51_P125679âÃô, âÃòA_51_P125779â [... truncated] > > > > Is there a better way to discard the duplicataions in the text file > (Excel > > file is the origin). > > > >> R.version > > _ > > platform x86_64-apple-darwin9.8.0 > > arch x86_64 > > os darwin9.8.0 > > system x86_64, darwin9.8.0 > > status Patched > > major 2 > > minor 11.1 > > year 2010 > > month 06 > > day03 > > svn rev52201 > > language R > > version.string R version 2.11.1 Patched (2010-06-03 r52201) > > > > THX > > > > Assa > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unqiue problem
Your process does remove all the duplicate entries based on the content of the two columns. After you do this, there are still duplicate entries in the first column that you are trying to use as rownames and therefore the error. Why to you want to use non-unique entries as rownames? Do you really need the row names, or should you only be keeping unique values for the first column? On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz wrote: > Hello everybody, > > I have a a matrix of 2 columns and over 27k rows. > some of the rows are double , so I tried to remove them with the command > unique(): > >> Workbook5 <- read.delim(file = "Workbook5.txt") >> dim(Workbook5) > [1] 27748 2 >> Workbook5 <- unique(Workbook5) >> dim(Workbook5) > [1] 20101 2 > > it removed a lot of line, but unfortunately not all of them. I wanted to add > the row names to the matrix and got this error message: >> rownames(Workbook5) <- Workbook5[,1] > Error in `row.names<-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L, : > duplicate 'row.names' are not allowed > In addition: Warning message: > non-unique values when setting 'row.names': ‚ÄòA_51_P102339‚Äô, > ‚ÄòA_51_P102518‚Äô, ‚ÄòA_51_P103435‚Äô, ‚ÄòA_51_P103465‚Äô, > ‚ÄòA_51_P103594‚Äô, ‚ÄòA_51_P104409‚Äô, ‚ÄòA_51_P104718‚Äô, > ‚ÄòA_51_P105869‚Äô, ‚ÄòA_51_P106428‚Äô, ‚ÄòA_51_P106799‚Äô, > ‚ÄòA_51_P107176‚Äô, ‚ÄòA_51_P107959‚Äô, ‚ÄòA_51_P108767‚Äô, > ‚ÄòA_51_P109258‚Äô, ‚ÄòA_51_P109708‚Äô, ‚ÄòA_51_P110341‚Äô, > ‚ÄòA_51_P111757‚Äô, ‚ÄòA_51_P112427‚Äô, ‚ÄòA_51_P112662‚Äô, > ‚ÄòA_51_P113672‚Äô, ‚ÄòA_51_P115018‚Äô, ‚ÄòA_51_P116496‚Äô, > ‚ÄòA_51_P116636‚Äô, ‚ÄòA_51_P117666‚Äô, ‚ÄòA_51_P118132‚Äô, > ‚ÄòA_51_P118168‚Äô, ‚ÄòA_51_P118400‚Äô, ‚ÄòA_51_P118506‚Äô, > ‚ÄòA_51_P119315‚Äô, ‚ÄòA_51_P120093‚Äô, ‚ÄòA_51_P120305‚Äô, > ‚ÄòA_51_P120738‚Äô, ‚ÄòA_51_P120785‚Äô, ‚ÄòA_51_P121134‚Äô, > ‚ÄòA_51_P121359‚Äô, ‚ÄòA_51_P121412‚Äô, ‚ÄòA_51_P121652‚Äô, > ‚ÄòA_51_P121724‚Äô, ‚ÄòA_51_P121829‚Äô, ‚ÄòA_51_P122141‚Äô, > ‚ÄòA_51_P122964‚Äô, ‚ÄòA_51_P123422‚Äô, ‚ÄòA_51_P123895‚Äô, > ‚ÄòA_51_P124008‚Äô, ‚ÄòA_51_P124719‚Äô, ‚ÄòA_51_P125648‚Äô, > ‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated] > > Is there a better way to discard the duplicataions in the text file (Excel > file is the origin). > >> R.version > _ > platform x86_64-apple-darwin9.8.0 > arch x86_64 > os darwin9.8.0 > system x86_64, darwin9.8.0 > status Patched > major 2 > minor 11.1 > year 2010 > month 06 > day 03 > svn rev 52201 > language R > version.string R version 2.11.1 Patched (2010-06-03 r52201) > > THX > > Assa > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.