RE: clusters within a sample

2002-01-08 Thread Simon, Steve, PhD








Yvonne Unrau writes:



I am
working with a large administrative data (N=1,086)

set
for a foster care agency. In short, I am comparing

client
outcomes across two branches (each is delivering

a
different service model). For analyses, I am using

logistic
regression (SPSS) where my dependent

variables
include a variety of outcomes measuring


program success vs. failure. My test variable is the

program
(two groups), plus I have several other

demographic
and service related variables. 



My
problem is that I have two types of clusters of

children
in my data set: 



siblings
from the same biological family (may or may

not be
placed in the same foster home) 



foster
children placed in one foster home (may or may

not be
siblings) 



I am
looking for ways to test the amount of error

associated
with the above clusters using SPSS. My

strategy
to date has been to SELECT the restricted

sample,
run the LR analysis, then eyeball the results.

What
are my other options?



Wow! A
messy data set. What fun!



First
thing you should do is to get a handle on the size of your clusters. Are they often
just one child and only rarely do the clusters tend to be two or more children?
Or is it the opposite case, where almost every cluster has two or more children
in it.



If most of
your data is just one child in each cluster, then it may make sense to lower your
expectations. A binary dependent variable gives you relatively little information
about variability (at least compared to a continuous variable) and you may be
trying to estimate something without enough data to get any reasonable
estimates.



Second,
you need to understand how the data behaves at a higher level. Create an
aggregate variable across all members of the cluster and then model that
aggregate variable. This is tricky, and you may have to use a model which
assumes nice normal residuals when your data is clearly non-normal. That's
okay, because you are just trying to get a starting point for a more complex
analysis.



Third, you
need to abandon SPSS and use software that can model random effects in a
logistic regression analysis. The beta-binomial model is the one that was first
developed for this data, but other models have been used more recently. I think
SAS and STATA can handle this type of analysis and there is probably other software
as well.



Fourth,
you need to estimate each cluster effect separately first. Estimate the sibling
effect ignoring the foster family effect. If possible, randomly select only one
member within each foster family and do the analysis with a random sibling
effect. Reverse the process and estimate the foster family effect after
randomly selecting only one sibling.



Fifth, see
if you can estimate both effects simultaneously. This model is very complex and
even software that can handle random effects in a logistic regression model may
not be able to handle this.



You may want
to become friends with someone in the Statistics Department at your university.
This is a very tricky analysis.



Good luck!



Steve
Simon, [EMAIL PROTECTED], Standard Disclaimer.

The STATS
web page has moved to

http://www.childrens-mercy.org/stats










clusters within a sample

2002-01-07 Thread Yvonne Unrau


I am working with a large administrative data (N=1,086) set for a
foster care agency. In short, I am comparing client outcomes across two
branches (each is delivering a different service model). For analyses, I
am using logistic regression (SPSS) where my dependent variables include
a variety of outcomes measuring program success vs. failure. My test
variable is the program (two groups), plus I have several other
demographic and service related variables. 
My problem is that I have two types of clusters of children
in my data set:

siblings from the same biological family (may or may not be placed in
the same foster home)
foster children placed in one foster home (may or may not be
siblings)
I am looking for ways to test the amount of error associated
with the above clusters using SPSS. My strategy to date has been to
SELECT the restricted sample, run the LR analysis, then eyeball the
results. What are my other options?
Many thanks.

Yvonne A. Unrau, PhD
Associate Professor
School of Social Work
Illinois State
University
Campus Box 4650
Normal, Illinois 61790-4650
Direct Office Phone: (309) 438-8579
School Office Phone: (309) 438-3631
School Fax: (309) 438-5880
e-mail: [EMAIL PROTECTED]