Kelly:

Glad you got what you were looking for, but this whole thread begs the
question; (Why) should you do this? You lose information in binning
the continuous data, of course. Perhaps your answer is that the point
scatter in the data is too noisy to clearly discern what's going on, a
legitimate response. One might  then -- or in general -- consider
overlaying a fitted smooth (nonparameteric) curve to the data to
reveal the "trend." There are a zillion ways to do this in R: both
lattice and ggplot have built-in capabilities to do this easily, as
does base R with ?scatter.smooth. If that's too easy, you can do it by
hand via ?lowess (or it's more flexible cousin, ?loess),
smooth.spline, etc. In actuality, your binning strategy is a crude,
non-smooth version of such smoothing, so it's not that far-fetched. Or
as some of the choicer R-Help pages say, cutting and boxplotting is to
smoothing as histograms are to nonparametric density estimates.

Cheers,
Bert


On Fri, Dec 9, 2011 at 12:05 PM, Vining, Kelly
<kelly.vin...@oregonstate.edu> wrote:
> Thanks to David and Jorge - both of your helpful suggestions got me to the 
> desired endpoint. In case anyone else has this question: I boxplotted my y 
> variable data, but did the "cut" operation on the x variable in order to 
> conserve the order of the y data. I see another suggestion coming in from 
> another user that basically says this.
>
> So, my working line of code was:
>
> boxplot(count$RPKM ~ cut(count$C_count, breaks=4)
>
> Much appreciation to everyone who responded...thanks for helping with a naïve 
> question without making me feel stupid.
>
> This discussion board is very, very good.
>
> --Kelly V.
>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsem...@comcast.net]
> Sent: Friday, December 09, 2011 11:58 AM
> To: Uwe Ligges
> Cc: Vining, Kelly; r-help@r-project.org
> Subject: Re: [R] scatterplot to boxplot translation?
>
>
> On Dec 9, 2011, at 2:50 PM, Uwe Ligges wrote:
>
>>
>>
>> On 09.12.2011 20:41, Vining, Kelly wrote:
>>> Thanks for the tip on "cut," seems like it should work. I must still
>>> be missing something, though. Here, I'm cutting on the y variable,
>>> then attempting the boxplot:
>>>
>>> cutRPKM<- cut(count$RPKM, breaks=4)
>>>
>>> head(cutRPKM)
>>> [1] (-0.0995,24.8] (-0.0995,24.8] (-0.0995,24.8] (-0.0995,24.8]
>>> (-0.0995,24.8] [6] (-0.0995,24.8]
>>> Levels: (-0.0995,24.8] (24.8,49.8] (49.8,74.7] (74.7,99.6]
>>>
>>> boxplot(as.numeric(cutRPKM))
>>>
>>> This gives me a single box instead of five boxes. ??
>>
>>
>> You obviously want:
>>
>> boxplot(count$RPKM ~ cut(count$RPKM, breaks=seq(0, max(count$RPKM),
>> by=100)))
>
> In that context (having defined a cut-variable with single-integer break 
> argument),  would have thought this should work:
>
>  boxplot(count$RPKM ~ cutRPKM)
>
> --
> David.
>
>>
>>
>> Uwe Ligges
>>
>>
>>> Thanks again,
>>> --Kelly V.
>>> ________________________________________
>>> From: David Winsemius [dwinsem...@comcast.net]
>>> Sent: Friday, December 09, 2011 11:14 AM
>>> To: Vining, Kelly
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] scatterplot to boxplot translation?
>>>
>>> On Dec 9, 2011, at 2:11 PM, Vining, Kelly wrote:
>>>
>>>> My apologies if anyone is seeing this twice...looks like my previous
>>>> message didn't come through...
>>>>
>>>> Dear UseRs,
>>>> I have a feeling this is a relatively simple question, but I'm
>>>> having a hard time getting my head around it. I have a simple x-y
>>>> scatterplot with many points, as shown below(attached). I'd like to
>>>> make a boxplot of this by interval, such that there is one box
>>>> representing the points in the 0-100 interval, one for the 101-200
>>>> interval, and so on. How do I structure my R data frame to be able
>>>> to generate such a boxplot?
>>>>
>>>
>>> ?cut
>>>
>>>>
>>>> From: r-help-boun...@r-project.org
>>>> [mailto:r-help-boun...@r-project.org
>>>> ] On Behalf Of Vining, Kelly
>>>> Sent: Friday, December 09, 2011 11:01 AM
>>>> To: r-help@r-project.org
>>>> Subject: [R] scatterplot to boxplot translation?
>>>>
>>>>
>>>> <C_count_vs_RPKM.png>______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius, MD
>>> West Hartford, CT
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to