When your brain is wired to treat a data frame like a matrix, then you think things like
for ( col in colnames( col ) ) { idx <- expr D[ col, idx ] <- otherexpr } are reasonable, when for ( col in colnames( col ) ) { idx <- expr D[[ col ]][ idx ] <- otherexpr } does actually run significantly faster. On December 21, 2021 9:28:52 AM PST, "Fox, John" <j...@mcmaster.ca> wrote: >Dear Jeff, > >On 2021-12-21, 11:59 AM, "R-help on behalf of Jeff Newmiller" ><r-help-boun...@r-project.org on behalf of jdnew...@dcn.davis.ca.us> wrote: > > Intuitive, perhaps, but noticably slower. > >I think that in most applications, one wouldn't notice the difference; for >example: > >> D <- data.frame(matrix(rnorm(1000*1e6), 1e6, 1000)) > >> microbenchmark(D[, 1]) >Unit: microseconds > expr min lq mean median uq max neval > D[, 1] 3.321 3.362 3.98561 3.444 3.5875 51.291 100 > >> microbenchmark(D[[1]]) >Unit: microseconds > expr min lq mean median uq max neval > D[[1]] 1.722 1.763 1.99137 1.804 1.8655 17.876 100 > >Best, > John > > > And it doesn't work on tibbles by design. Data frames are lists of columns. > > > On December 21, 2021 8:38:35 AM PST, Duncan Murdoch > <murdoch.dun...@gmail.com> wrote: > >On 21/12/2021 11:31 a.m., Duncan Murdoch wrote: > >> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote: > >>> Thanks for the reply. > >>> > >>> sort(unique(Data[1])) > >>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = > >>> decreasing)) : > >>> undefined columns selected > >> > >> That's the wrong syntax: Data[1] is not "column one of Data". Use > >> Data[[1]] for that, so > >> > >> sort(unique(Data[[1]])) > > > >Actually, I'd probably recommend > > > > sort(unique(Data[, 1])) > > > >instead. This treats Data as a matrix rather than as a list. > >Dataframes are lists that look like matrices, but to me the matrix > >aspect is usually more intuitive. > > > >Duncan Murdoch > > > >> > >> I think Rui already pointed out the typo in the quoted text below... > >> > >> Duncan Murdoch > >> > >>> > >>> The recommended syntax did not work, as listed above. > >>> > >>> What I want is the sort of distinct column output. Again, the column > may > >>> be text or numbers. This is a huge analysis effort with data coming at > >>> me from many different sources. > >>> > >>> > >>> *Stephen Dawson, DSL* > >>> /Executive Strategy Consultant/ > >>> Business & Technology > >>> +1 (865) 804-3454 > >>> http://www.shdawson.com <http://www.shdawson.com> > >>> > >>> > >>> On 12/21/21 11:07 AM, Duncan Murdoch wrote: > >>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote: > >>>>> Thanks everyone for the replies. > >>>>> > >>>>> It is clear one either needs to write a function or put the unique > >>>>> entries into another dataframe. > >>>>> > >>>>> It seems odd R cannot sort a list of unique column entries with ease. > >>>>> Python and SQL can do it with ease. > >>>> > >>>> I've seen several responses that looked pretty simple. It's hard to > >>>> beat sort(unique(x)), though there's a fair bit of confusion about > >>>> what you actually want. Maybe you should post an example of the code > >>>> you'd use in Python? > >>>> > >>>> Duncan Murdoch > >>>> > >>>>> > >>>>> QUESTION > >>>>> Is there a simpler means than other than the unique function to > capture > >>>>> distinct column entries, then sort that list? > >>>>> > >>>>> > >>>>> *Stephen Dawson, DSL* > >>>>> /Executive Strategy Consultant/ > >>>>> Business & Technology > >>>>> +1 (865) 804-3454 > >>>>> http://www.shdawson.com <http://www.shdawson.com> > >>>>> > >>>>> > >>>>> On 12/20/21 5:53 PM, Rui Barradas wrote: > >>>>>> Hello, > >>>>>> > >>>>>> Inline. > >>>>>> > >>>>>> Às 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu: > >>>>>>> Thanks. > >>>>>>> > >>>>>>> sort(unique(Data[[1]])) > >>>>>>> > >>>>>>> This syntax provides row numbers, not column values. > >>>>>> > >>>>>> This is not right. > >>>>>> The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]] > >>>>>> extracts the column vector. > >>>>>> > >>>>>> As for my previous answer, it was not addressing the question, I > >>>>>> misinterpreted it as being a question on how to sort by numeric > order > >>>>>> when the data is not numeric. Here is a, hopefully, complete answer. > >>>>>> Still with package stringr. > >>>>>> > >>>>>> > >>>>>> cols_to_sort <- 1:4 > >>>>>> > >>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){ > >>>>>> stringr::str_sort(unique(x), numeric = TRUE) > >>>>>> }) > >>>>>> > >>>>>> > >>>>>> Or using Avi's suggestion of writing a function to do all the work > and > >>>>>> simplify the lapply loop later, > >>>>>> > >>>>>> > >>>>>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...) > >>>>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE) > >>>>>> > >>>>>> > >>>>>> Hope this helps, > >>>>>> > >>>>>> Rui Barradas > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> *Stephen Dawson, DSL* > >>>>>>> /Executive Strategy Consultant/ > >>>>>>> Business & Technology > >>>>>>> +1 (865) 804-3454 > >>>>>>> http://www.shdawson.com <http://www.shdawson.com> > >>>>>>> > >>>>>>> > >>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> > >>>>>>>> Running a simple syntax set to review entries in dataframe > columns. > >>>>>>>> Here is the working code. > >>>>>>>> > >>>>>>>> Data <- read.csv("./input/Source.csv", header=T) > >>>>>>>> describe(Data) > >>>>>>>> summary(Data) > >>>>>>>> unique(Data[1]) > >>>>>>>> unique(Data[2]) > >>>>>>>> unique(Data[3]) > >>>>>>>> unique(Data[4]) > >>>>>>>> > >>>>>>>> I would like to add sort the unique entries. The data in the > various > >>>>>>>> columns are not defined as numbers, but also text. I realize 1 and > >>>>>>>> 10 will not sort properly, as the column is not defined as a > number, > >>>>>>>> but want to see what I have in the columns viewed as sorted. > >>>>>>>> > >>>>>>>> QUESTION > >>>>>>>> What is the best process to sort unique output, please? > >>>>>>>> > >>>>>>>> > >>>>>>>> Thanks. > >>>>>>> > >>>>>>> ______________________________________________ > >>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>>> PLEASE do read the posting guide > >>>>>>> http://www.R-project.org/posting-guide.html > >>>>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>>> > >>>>> > >>>>> ______________________________________________ > >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>> PLEASE do read the posting guide > >>>>> http://www.R-project.org/posting-guide.html > >>>>> and provide commented, minimal, self-contained, reproducible code. > >>>> > >>>> > >>> > >>> > >> > > > >______________________________________________ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.