Re: [R] Summary information by groups programming assitance

Gabor Grothendieck Mon, 22 Dec 2008 18:15:15 -0800

Just sort the data first and then apply any of the solutions but with tail(x, 1)
instead of max, e.g.


DFo <- DF[order(DF$Lake, DF$Length, DF$vol), ]
aggregate(DFo[c("Length", "vol")], DFo[c("Lake", "psd")], tail, 1)


On Mon, Dec 22, 2008 at 8:15 PM, Ranney, Steven
<steven.ran...@montana.edu> wrote:
> Thank you all for your help.  I appreciate the assistance. I'm thinking I 
> should have been more specific in my original question.
>
> Unless I'm mistaken, all of the suggestions so far have been for maximum vol 
> and maximum Length by Lake and psd.  I'm trying to extract the max vol by 
> Lake and psd along with the corresponding value of Length.  So, instead of 
> maximum vol and maximum Length, I'd like to find the max vol and the Length 
> associated with that value.
>
> Sorry for any confusion,
>
> SR
>
> Steven H. Ranney
> Graduate Research Assistant (Ph.D)
> USGS Montana Cooperative Fishery Research Unit
> Montana State University
> P.O. Box 173460
> Bozeman, MT 59717-3460
>
> phone: (406) 994-6643
> fax: (406) 994-7479
>
> http://studentweb.montana.edu/steven.ranney
> ________________________________
>
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Mon 12/22/2008 5:15 PM
> To: Ranney, Steven
> Cc: r-help@r-project.org
> Subject: Re: [R] Summary information by groups programming assitance
>
>
> Here are two solutions assuming DF is your data frame:
>
> # 1. aggregate is in the base of R
>
> aggregate(DF[c("Length", "vol")], DF[c("Lake", "psd")], max)
>
> or the following which is the same except it labels psd as Category:
>
> aggregate(DF[c("Length", "vol")], with(DF, list(Lake = Lake, Category
> = psd)), max)
>
>
> # 2. sqldf.  The sqldf package allows specification using SQL notation:
>
> library|(sqldf)
> sqldf("select Lake, psd as Category, max(Length), max(vol) from DF
> group by Lake, psd")
>
> There are many other good solutions too using various packages which
> have already
> been mentioned on this thread.
>
> On Mon, Dec 22, 2008 at 4:51 PM, Ranney, Steven
> <steven.ran...@montana.edu> wrote:
>> All -
>>
>> I have data that looks like
>>
>>          psd   Species Lake Length  Weight    St.weight    Wr
>> Wr.1     vol
>> 432  substock     SMB      Clear    150   41.00      0.01  95.12438
>> 95.10118  0.0105
>> 433  substock     SMB      Clear    152   39.00      0.01  86.72916
>> 86.70692  0.0105
>> 434  substock     SMB      Clear    152   40.00      3.11  88.95298
>> 82.03689  3.2655
>> 435  substock     SMB      Clear    159   48.00      0.04  92.42095
>> 92.34393  0.0420
>> 436  substock     SMB      Clear    159   48.00      0.01  92.42095
>> 92.40170  0.0105
>> 437  substock     SMB      Clear    165   47.00      0.03  80.38023
>> 80.32892  0.0315
>> 438  substock     SMB      Clear    171   62.00      0.21  94.58105
>> 94.26070  0.2205
>> 439  substock     SMB      Clear    178   70.00      0.01  93.91912
>> 93.90571  0.0105
>> 440  substock     SMB      Clear    179   76.00      1.38 100.15760
>> 98.33895  1.4490
>> 441       S-Q     SMB      Clear    180   75.00      0.01  97.09330
>> 97.08035  0.0105
>> 442       S-Q     SMB      Clear    180   92.00      0.02 119.10111
>> 119.07522  0.0210
>> ...
>> [truncated]
>>
>> where psd and lake are categorical variables, with five and four
>> categories, respectively.  I'd like to find the maximum vol and the
>> lengths associated with each maximum vol by each category by each lake.
>> In other words, I'd like to have a data frame that looks something like
>>
>> Lake            Category        Length  vol
>> Clear           substock        152             3.2655
>> Clear           S-Q             266             11.73
>> Clear           Q-P             330             14.89
>> ...
>> Pickerel        substock        170             3.4965
>> Pickerel        S-Q             248             10.69
>> Pickerel        Q-P             335             25.62
>> Pickerel        P-M             415             32.62
>> Pickerel        M-T             442             17.25
>>
>>
>> In order to originally get this, I used
>>
>> with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max))
>> with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length, psd),max))
>> with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max))
>> with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max))
>>
>> and pulled the values I needed out by hand and put them into a .csv.
>> Unfortunately, I've got a number of other data sets upon which I'll need
>> to do the same analysis.  Finding a programmable alternative would
>> provide a much easier (and likely less error prone) method to achieve
>> the same results.  Ideally, the "Length" and "vol" data would be in a
>> data frame such that I could then analyze with nls.
>>
>> Does anyone have any thoughts as to how I might accomplish this?
>>
>> Thanks in advance,
>>
>> Steven Ranney
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summary information by groups programming assitance

Reply via email to