Re: [R] Visualizing binary response data?

2010-05-05 Thread Antony Unwin
You could also try using interactive graphics in iplots.  Linking from a 
barchart of your binary response variable to your eight continuous predictors 
in a parallel coordinate plot and to your four categorical predictors in some 
form of mosaicplot could be very informative.

Graphics are not necessarily the method of choice to select your predictor 
variables, as Frank Harrell has pointed out.  It is also sensible not to rely 
on modelling alone.  Graphic displays can help you better understand your data 
and models.  The two approaches are complementary.

Antony Unwin
University of Augsburg
Germany


On Tue, May 4, 2010 at 9:04 PM, Kim Jung Hwa wrote:

> Hi All,
> 
> I'm dealing with binary response data for the first time, and I'm confused
> about what kind of graphics I could explore in order to pick relevant
> predictors and their relation with response variable.
> 
> I have 8-10 continuous predictors and 4-5 categorical predictors. Can
> anyone
> suggest what kind of graphics I can explore to see how predictors behave
> w.r.t. response variable...
> 
> Any help would be greatly appreciated, thanks,
> Kim
> __


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Visualizing binary response data?

2010-05-04 Thread Frank E Harrell Jr

On 05/04/2010 09:12 PM, Thomas Stewart wrote:

For binary w.r.t. continuous, how about a smoothing spline?  As in,

x<-rnorm(100)
y<-rbinom(100,1,exp(.3*x-.07*x^2)/(1+exp(.3*x-.07*x^2)))
plot(x,y)
lines(smooth.spline(x,y))

OR how about a more parametric approach, logistic regression?  As in,

glm1<-glm(y~x+I(x^2),family=binomial)
plot(x,y)
lines(sort(x),predict(glm1,newdata=data.frame(x=sort(x)),type="response"))

FOR binary w.r.t. categorical it depends.  Are the categories ordinal (is
there a natural ordering?) or are the categories nominal (no ordering)?  For
nominal categories, the data is essentially a contingency table, and
"strength of the predictor" is a test of independence.  You can still do a
graphical exploration: maybe plotting the proportion of Y=1 for each
category of X.   As in,

z<-cut(x,breaks=-3:3)
plot(tapply(y,z,mean))

If your goal is to find strong predictors of Y, you may want to consider
graphical measures that look at the predictors jointly.  Maybe with a
generalized additive model (gam)?

There is probably a lot more you can do.  Be creative.

-tgs


And you have to decide why you would look to a graph to select 
predictors.  This can badly distort later inferences (confidence 
intervals, P-values, biased regression coefficients, biased R^2, etc.).


Frank




On Tue, May 4, 2010 at 9:04 PM, Kim Jung Hwawrote:


Hi All,

I'm dealing with binary response data for the first time, and I'm confused
about what kind of graphics I could explore in order to pick relevant
predictors and their relation with response variable.

I have 8-10 continuous predictors and 4-5 categorical predictors. Can
anyone
suggest what kind of graphics I can explore to see how predictors behave
w.r.t. response variable...

Any help would be greatly appreciated, thanks,
Kim


--
Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Visualizing binary response data?

2010-05-04 Thread Thomas Stewart
For binary w.r.t. continuous, how about a smoothing spline?  As in,

x<-rnorm(100)
y<-rbinom(100,1,exp(.3*x-.07*x^2)/(1+exp(.3*x-.07*x^2)))
plot(x,y)
lines(smooth.spline(x,y))

OR how about a more parametric approach, logistic regression?  As in,

glm1<-glm(y~x+I(x^2),family=binomial)
plot(x,y)
lines(sort(x),predict(glm1,newdata=data.frame(x=sort(x)),type="response"))

FOR binary w.r.t. categorical it depends.  Are the categories ordinal (is
there a natural ordering?) or are the categories nominal (no ordering)?  For
nominal categories, the data is essentially a contingency table, and
"strength of the predictor" is a test of independence.  You can still do a
graphical exploration: maybe plotting the proportion of Y=1 for each
category of X.   As in,

z<-cut(x,breaks=-3:3)
plot(tapply(y,z,mean))

If your goal is to find strong predictors of Y, you may want to consider
graphical measures that look at the predictors jointly.  Maybe with a
generalized additive model (gam)?

There is probably a lot more you can do.  Be creative.

-tgs



On Tue, May 4, 2010 at 9:04 PM, Kim Jung Hwa wrote:

> Hi All,
>
> I'm dealing with binary response data for the first time, and I'm confused
> about what kind of graphics I could explore in order to pick relevant
> predictors and their relation with response variable.
>
> I have 8-10 continuous predictors and 4-5 categorical predictors. Can
> anyone
> suggest what kind of graphics I can explore to see how predictors behave
> w.r.t. response variable...
>
> Any help would be greatly appreciated, thanks,
> Kim
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Visualizing binary response data?

2010-05-04 Thread Kim Jung Hwa
Hi All,

I'm dealing with binary response data for the first time, and I'm confused
about what kind of graphics I could explore in order to pick relevant
predictors and their relation with response variable.

I have 8-10 continuous predictors and 4-5 categorical predictors. Can anyone
suggest what kind of graphics I can explore to see how predictors behave
w.r.t. response variable...

Any help would be greatly appreciated, thanks,
Kim

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.