Sorry, misstatements. It should (of course) read: If one makes the reasonable assumption that Pct is much larger than Cutoff, sorting Pct is the expensive part e.g O(nlog2(n) for Quicksort (n = length Pct). I believe looping is O(n^2). etc.
On Mon, Oct 16, 2023 at 7:48 AM Bert Gunter <bgunter.4...@gmail.com> wrote: > > If one makes the reasonable assumption that Pct is much larger than > Cutoff, sorting Cutoff is the expensive part e.g O(nlog2(n) for > Quicksort (n = length Cutoff). I believe looping is O(n^2). Jeff's > approach using findInterval may be faster. Of course implementation > details matter. > > -- Bert > > On Mon, Oct 16, 2023 at 4:41 AM Leonard Mada <leo.m...@syonic.eu> wrote: > > > > Dear Jason, > > > > The code could look something like: > > > > dummyData = data.frame(Tract=seq(1, 10, by=1), > > Pct = c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03), > > Totpop = c(4000,3500,4500,4100,3900,4250,5100,4700,4950,4800)) > > > > # Define the cutoffs > > # - allow for duplicate entries; > > by = 0.03; # by = 0.01; > > cutoffs <- seq(0, 0.20, by = by) > > > > # Create a new column with cutoffs > > dummyData$Cutoff <- cut(dummyData$Pct, breaks = cutoffs, > > labels = cutoffs[-1], ordered_result = TRUE) > > > > # Sort data > > # - we could actually order only the columns: > > # Totpop & Cutoff; > > dummyData = dummyData[order(dummyData$Cutoff), ] > > > > # Result > > cs = cumsum(dummyData$Totpop) > > > > # Only last entry: > > # - I do not have a nice one-liner, but this should do it: > > isLast = rev(! duplicated(rev(dummyData$Cutoff))) > > > > data.frame(Total = cs[isLast], > > Cutoff = dummyData$Cutoff[isLast]) > > > > > > Sincerely, > > > > Leonard > > > > > > On 10/15/2023 7:41 PM, Leonard Mada wrote: > > > Dear Jason, > > > > > > > > > I do not think that the solution based on aggregate offered by GPT was > > > correct. That quasi-solution only aggregates for every individual level. > > > > > > > > > As I understand, you want the cumulative sum. The idea was proposed by > > > Bert; you need only to sort first based on the cutoff (e.g. using an > > > ordered factor). And then only extract the last value for each level. > > > If Pct is unique, than you can skip this last step and use directly > > > the cumsum (but on the sorted data set). > > > > > > > > > Alternatives: see the solutions with loops or with sapply. > > > > > > > > > Sincerely, > > > > > > > > > Leonard > > > > > > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.