Hi Jeff, Thanks for your help, but this doesn't work, there are two problems. First and most important I need to keep the last _per category_ where my category is n and not the last globally. Second, there seems to be an issue with the subset variation that ends up not filtering anything ... but this is a minor thing.
Best. Giovanni On Sep 9, 2012, at 5:59 PM, Jeff Newmiller wrote: > dfthin <- df[ c(which(iter %% 500 == 0),nrow(df) ] > > or > > dfthin <- subset(df, (iter %% 500 == 0) | (seq.int(nrow(df)==nrow(df))) > > N.B. You should avoid using the name "df" for your variables, because it is > the name of a built-in function that you are hiding by doing so. Others may > be confused, and eventually you may want to use that function yourself. One > solution is to use DF for your variables... another is to use more > descriptive names. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > Giovanni Azua <brave...@gmail.com> wrote: > >> Hello, >> >> I bumped into the following funny use-case. I have too much data for a >> given plot. I have the following data frame df: >> >>> str(df) >> 'data.frame': 5015 obs. of 5 variables: >> $ n : Factor w/ 5 levels "1000","2000",..: 1 1 1 1 1 1 1 1 1 1 >> ... >> $ iter : int 10 20 30 40 50 60 70 80 90 100 ... >> $ Error : num 1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ... >> $ Duality_Gap: num 20080 3789 855 443 321 ... >> $ Runtime : num 0.00536 0.01353 0.01462 0.01571 0.01681 ... >> >> But if I plot e.g. Runtime vs log(Duality Gap) I have too many >> observations due to taking a snapshot every 10 iterations rather than >> say 500 and the plot looks very cluttered. So I would like to trim the >> data frame including only those records for which iter is multiple of >> 500 and so I do this: >> >> df <- subset(df, iter %% 500 == 0) >> >> This gives me almost exactly what I need except that the last and most >> important Duality Gap observations are of course gone due to the >> filtering ... I would like to change the subset clause to be iter %% >> 500 _or_ the record is the last per n (n is my problem size and >> category in this case) ... how can I do that? >> >> I thought of adding a new column that flags whether a given row is the >> last element per category as "last" Boolean but this is a bit too >> complicated .. is there a simpler condition construct that can be used >> with the subset command? >> >> TIA, >> Best regards, >> Giovanni >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.