Re: [R] use subset to trim data but include last per category

Giovanni Azua Sun, 09 Sep 2012 09:22:52 -0700

Hi Jeff,

Thanks for your help, but this doesn't work, there are two problems. First and 
most important I need to keep the last _per category_ where my category is n 
and not the last globally. Second, there seems to be an issue with the subset 
variation that ends up not filtering anything ... but this is a minor thing.


Best.
Giovanni

On Sep 9, 2012, at 5:59 PM, Jeff Newmiller wrote:

> dfthin <- df[ c(which(iter %% 500 == 0),nrow(df) ]
> 
> or
> 
> dfthin <- subset(df, (iter %% 500 == 0) | (seq.int(nrow(df)==nrow(df)))
> 
> N.B. You should avoid using the name "df" for your variables, because it is 
> the name of a built-in function that you are hiding by doing so. Others may 
> be confused, and eventually you may want to use that function yourself. One 
> solution is to use DF for your variables... another is to use more 
> descriptive names.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                      Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> --------------------------------------------------------------------------- 
> Sent from my phone. Please excuse my brevity.
> 
> Giovanni Azua <brave...@gmail.com> wrote:
> 
>> Hello,
>> 
>> I bumped into the following funny use-case. I have too much data for a
>> given plot. I have the following data frame df: 
>> 
>>> str(df)
>> 'data.frame':        5015 obs. of  5 variables:
>> $ n          : Factor w/ 5 levels "1000","2000",..: 1 1 1 1 1 1 1 1 1 1
>> ...
>> $ iter       : int  10 20 30 40 50 60 70 80 90 100 ...
>> $ Error      : num  1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ...
>> $ Duality_Gap: num  20080 3789 855 443 321 ...
>> $ Runtime    : num  0.00536 0.01353 0.01462 0.01571 0.01681 ...
>> 
>> But if I plot e.g. Runtime vs log(Duality Gap) I have too many
>> observations due to taking a snapshot every 10 iterations rather than
>> say 500 and the plot looks very cluttered. So I would like to trim the
>> data frame including only those records for which iter is multiple of
>> 500 and so I do this:
>> 
>> df <- subset(df, iter %% 500 == 0)
>> 
>> This gives me almost exactly what I need except that the last and most
>> important Duality Gap observations are of course gone due to the
>> filtering ... I would like to change the subset clause to be iter %%
>> 500 _or_ the record is the last per n (n is my problem size and
>> category in this case) ... how can I do that?
>> 
>> I thought of adding a new column that flags whether a given row is the
>> last element per category as "last" Boolean but this is a bit too
>> complicated .. is there a simpler condition construct that can be used
>> with the subset command?
>> 
>> TIA,
>> Best regards,
>> Giovanni    
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use subset to trim data but include last per category

Reply via email to