Re: [R] Summary information by groups programming assitance

Gabor Grothendieck Mon, 22 Dec 2008 18:30:30 -0800

The sorting should have been by Lake, psd and vol (not what I had)
so it should be revised to:


DFo <- DF[order(DF$Lake, DF$psd, DF$vol), ]
aggregate(DFo[c("Length", "vol")], DFo[c("Lake", "psd")], tail, 1)

This is the same as before except DF$psd is used in place of DF$Length
in the first line.

On Mon, Dec 22, 2008 at 9:14 PM, Gabor Grothendieck
<ggrothendi...@gmail.com> wrote:
> Just sort the data first and then apply any of the solutions but with tail(x, 
> 1)
> instead of max, e.g.
>
> DFo <- DF[order(DF$Lake, DF$Length, DF$vol), ]
> aggregate(DFo[c("Length", "vol")], DFo[c("Lake", "psd")], tail, 1)
>
>
> On Mon, Dec 22, 2008 at 8:15 PM, Ranney, Steven
> <steven.ran...@montana.edu> wrote:
>> Thank you all for your help.  I appreciate the assistance. I'm thinking I 
>> should have been more specific in my original question.
>>
>> Unless I'm mistaken, all of the suggestions so far have been for maximum vol 
>> and maximum Length by Lake and psd.  I'm trying to extract the max vol by 
>> Lake and psd along with the corresponding value of Length.  So, instead of 
>> maximum vol and maximum Length, I'd like to find the max vol and the Length 
>> associated with that value.
>>
>> Sorry for any confusion,
>>
>> SR
>>
>> Steven H. Ranney
>> Graduate Research Assistant (Ph.D)
>> USGS Montana Cooperative Fishery Research Unit
>> Montana State University
>> P.O. Box 173460
>> Bozeman, MT 59717-3460
>>
>> phone: (406) 994-6643
>> fax: (406) 994-7479
>>
>> http://studentweb.montana.edu/steven.ranney
>> ________________________________
>>
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Mon 12/22/2008 5:15 PM
>> To: Ranney, Steven
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Summary information by groups programming assitance
>>
>>
>> Here are two solutions assuming DF is your data frame:
>>
>> # 1. aggregate is in the base of R
>>
>> aggregate(DF[c("Length", "vol")], DF[c("Lake", "psd")], max)
>>
>> or the following which is the same except it labels psd as Category:
>>
>> aggregate(DF[c("Length", "vol")], with(DF, list(Lake = Lake, Category
>> = psd)), max)
>>
>>
>> # 2. sqldf.  The sqldf package allows specification using SQL notation:
>>
>> library|(sqldf)
>> sqldf("select Lake, psd as Category, max(Length), max(vol) from DF
>> group by Lake, psd")
>>
>> There are many other good solutions too using various packages which
>> have already
>> been mentioned on this thread.
>>
>> On Mon, Dec 22, 2008 at 4:51 PM, Ranney, Steven
>> <steven.ran...@montana.edu> wrote:
>>> All -
>>>
>>> I have data that looks like
>>>
>>>          psd   Species Lake Length  Weight    St.weight    Wr
>>> Wr.1     vol
>>> 432  substock     SMB      Clear    150   41.00      0.01  95.12438
>>> 95.10118  0.0105
>>> 433  substock     SMB      Clear    152   39.00      0.01  86.72916
>>> 86.70692  0.0105
>>> 434  substock     SMB      Clear    152   40.00      3.11  88.95298
>>> 82.03689  3.2655
>>> 435  substock     SMB      Clear    159   48.00      0.04  92.42095
>>> 92.34393  0.0420
>>> 436  substock     SMB      Clear    159   48.00      0.01  92.42095
>>> 92.40170  0.0105
>>> 437  substock     SMB      Clear    165   47.00      0.03  80.38023
>>> 80.32892  0.0315
>>> 438  substock     SMB      Clear    171   62.00      0.21  94.58105
>>> 94.26070  0.2205
>>> 439  substock     SMB      Clear    178   70.00      0.01  93.91912
>>> 93.90571  0.0105
>>> 440  substock     SMB      Clear    179   76.00      1.38 100.15760
>>> 98.33895  1.4490
>>> 441       S-Q     SMB      Clear    180   75.00      0.01  97.09330
>>> 97.08035  0.0105
>>> 442       S-Q     SMB      Clear    180   92.00      0.02 119.10111
>>> 119.07522  0.0210
>>> ...
>>> [truncated]
>>>
>>> where psd and lake are categorical variables, with five and four
>>> categories, respectively.  I'd like to find the maximum vol and the
>>> lengths associated with each maximum vol by each category by each lake.
>>> In other words, I'd like to have a data frame that looks something like
>>>
>>> Lake            Category        Length  vol
>>> Clear           substock        152             3.2655
>>> Clear           S-Q             266             11.73
>>> Clear           Q-P             330             14.89
>>> ...
>>> Pickerel        substock        170             3.4965
>>> Pickerel        S-Q             248             10.69
>>> Pickerel        Q-P             335             25.62
>>> Pickerel        P-M             415             32.62
>>> Pickerel        M-T             442             17.25
>>>
>>>
>>> In order to originally get this, I used
>>>
>>> with(smb[Lake=="Clear",], tapply(vol, list(Length, psd),max))
>>> with(smb[Lake=="Enemy.Swim",], tapply(vol, list(Length, psd),max))
>>> with(smb[Lake=="Pickerel",], tapply(vol, list(Length, psd),max))
>>> with(smb[Lake=="Roy",], tapply(vol, list(Length, psd),max))
>>>
>>> and pulled the values I needed out by hand and put them into a .csv.
>>> Unfortunately, I've got a number of other data sets upon which I'll need
>>> to do the same analysis.  Finding a programmable alternative would
>>> provide a much easier (and likely less error prone) method to achieve
>>> the same results.  Ideally, the "Length" and "vol" data would be in a
>>> data frame such that I could then analyze with nls.
>>>
>>> Does anyone have any thoughts as to how I might accomplish this?
>>>
>>> Thanks in advance,
>>>
>>> Steven Ranney
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summary information by groups programming assitance

Reply via email to