Some clarification would help.  See below.

On Wed, 1 Aug 2001, Teen Assessment Project wrote:

> I have an overall sample of 5000+  from 40+ different towns and 6 
> different grades. 

In approximately equal numbers per town/grade, or not?  
Are all 6 grades (which grades?) represented in each town?
Do they always coexist within schools, or are they divided (e.g., 
between junior-high schools and high schools)?  (Etc.)
How were these cases sampled from the population?  
(And possibly relevant:  how large is the population?)
[Bluntly:  How do you know the overall sample is worth using as a 
standard of comparison, as appears to be desired?]

> One person wants to look at a subsample of 200 from specific towns and 
> grades and compare this subsample with the rest of the group on outcome 
> variables.   Advice appreciated here.

Why 200?  (Arbitrary round number?  Result of power calculation?  
 Maximum size dictated by constraints you haven't bothered to mention? 
 Outcome of consulting the entrails of a NH Red chicken?)
 Why not _all_ the respondents from the specified towns?  

> The only demographics are self-reported family structure and
> maternal/paternal education. 

If you know the names of the towns (which seems to be implied by the 
description of the desired subsample), you also know the population of 
the towns and some (admittedly rather general) other demographic 
information:  e.g., whether the school is located in a community (and 
what kind of community, e.g., Manchester HS West) or in a wilderness 
("Vox clamanti in deserto", as they say at Dartmouth, which would also 
be appropriate for John Stark Regional HS, in the wilds of western 
Weare).  In the larger towns, do you also know the names of the schools 
containing the members of your sample?  That might provide additional 
detail.  (Or not, for city schools that draw students from outlying more 
rural or suburban areas.)

> Ideas:  1) I could try to match the demographics/grades of the 
> selected 200 with 200/5000 other subjects.

Why would you wish to do that?  You write above, "One person wants to 
look at a subsample of 200 ... and compare this subsample with the rest 
of the group on outcome variables."  The matching you propose would 
seem, on the face of it, to invalidate any comparison between the groups 
to be compared. 

> 2) I could randomly  select 200/5000 other subjects and test to see if
> there is a sign difference in the demographics.  
        [One presumes "sign" here is a contraction of "significant",
         and does not (necessarily) imply a sign test.]

True.  This does not appear to be what "One person" wants to do, though, 
which is to compare (= test?) for differences in the outcome variables.
Something's missing here.  What does "One person" really want to do (or 
say s/he wants to do), when not constrained to speak in a kind of 
pseudo-statistish language?  What theory informs the intent of the 
proposed study (or, if no theory, what kinds of practical decisions 
might it be reasonably expected to lead to?)?

> 3)??  4)??  Alternative sampling procedures aren't useful to 
contemplate in the absence of design or purpose information.

> Outcome variables are all categorical --

By this do you mean that they are all of "yes/no" or "true/false" form 
(or equivalent)?  Or are some of them a choice of one from among several 
named categories?  Or multiple choices among multiple categories?  
 Are any of these sets of ordered categories (such as one might elicit 
from Likert-type items)?
 Do the variables come in sets (or dimensions) that lend themselves to 
any kind of summary scoring?  (E.g., total # of categories of this kind 
that are "true" or "yes" (or whatever) for this particular case.)
 Ought you to be doing some sort of scaling analysis on the categories, 
to produce interval-level scaled variables?  (Search on "dual scaling" 
and "correspondence analysis".)

> assuming chi-squares testing here. 

There are a variety of kinds of chi-square tests.  If you are (as one 
suspects) referring to two-dimensional cross-classification tables, and 
testing the independence of classifications, this is of course possible.  
It may not be optimal:  depending in large part on what the _real_ 
questions are that "One person" wants to address, and on the nature(s) 
of the variables of interest.  Scaling of the category systems would 
yield variables you could subject to various linear models -- multiple  
regression, analysis of variance/covariance, Hotelling's T-square, etc. 


 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110                          603-471-7128



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to