variance estimation and cross-validation

2001-07-24 Thread Mark Everingham

Can anyone help with this please?

---

I have a set of N images. I train a classifier to label pixels in an image
as one of a set of classes. To estimate the accuracy of the classifier I use
cross-validation with k folds, training on k-1 and testing on 1. Thus the
estimated accuracy on an image is

mu = mean(mean[i], i=1..k)

where mean[i] is the mean accuracy across the images in fold i

I also want to know how much the accuracy varies from one image to another.
I can think of two ways of estimating this:

(a) sigma^2 = mean(var[i], i=1..k)

where var[i] is the variance of the accuracy across the images in fold i

or

(b) sigma^2 = var(mean[i], i=1..k) * n

where n is the number of images in each of the folds.

---

An example:

fold  mean   var
   1  91.43  36.2404
   2  89.05  58.3696
   3  97.39  3.3856
   4  89.38  78.1456
   5  91.09  104.858
   6  88.49  87.4225
   7  86.59  148.596
   8  90.36  97.8121
   9  86.05  77.6161
  10  88.98  125.44

n = 8 (fold size)

mu = 89.881
sigma^2 by (a) = 81.7886 (sigma = 9.0437)
simga^2 by (b) = 71.7367 (sigma = 8.4698)

---

Which estimate is better, or are both incorrect? I appreciate that the fold
size (8) and number of folds (10) are small. Is there a better way? Is there
any way to establish a confidence interval on the estimate?

Thanks
Mark



Mark Everingham   Phone: +44 117 9545249
Room 1.15 Fax:   +44 117 9545208
Merchant Venturers Building   Email: [EMAIL PROTECTED]
University of Bristol WWW:   http://www.cs.bris.ac.uk/~everingm/
Bristol BS8 1UB, UK






=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



SPC in Iron Casting Foundry

2001-07-24 Thread See Liang

Can anyone recommend books or websites where I can find information
specific to SPC application in iron casting foundry?
TIA.

Best Regards,
ONG See Liang
e-mail : [EMAIL PROTECTED] 
Remove NO..SPAM from e-mail 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Statistics Q

2001-07-24 Thread Jerry Dallal

Herman Rubin wrote:
 
 For prediction, we should estimate the distribution of
 the errors and use that; the distribution of the errors
 of estimate are not going to be too far from normal
 compared to that, if the regression is a reasonable
 model.  Lack of near independence between the predictors
 and the errors puts a major question on the prediction.

 For enlightenment, the distribution of the errors is not
 of much importance.

Nicely stated.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



RE: Interclass Correlation??

2001-07-24 Thread Paul R. Swank

If your interest is reliability then you don't need to do any statistical
comparisons. What you are describing is a case for generalizability theory
in which you use the data to estimate the variance components and then
estimate what the reliability would be if you vary the number of trials.
Books by Brennan and Shavelson  Webb or the original by Cronbach et al
would be helpful.

Cronbach, L., Gleser, G., Nanda, H.  Rajaratnam, N. (1972). The
dependability of behavioral measurements. New York: Wiley.

Brennan, R. (1983). Elements of generalizability theory. Iowa City, IA:
American College Testing Program.

Shavelson, R. $ Webb, N. (1991). Generalizability Theory: A primer. Newbury
Park, CA: Sage.

Paul R. Swank, Ph.D.
Professor
Developmental Pediatrics
UT Houston Health Science Center

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of Clark Dickin
Sent: Monday, July 23, 2001 10:08 PM
To: [EMAIL PROTECTED]
Subject: Interclass Correlation??


I am trying to determine the reliability of a balance test for individuals
with Alzheimer's disease. The test involves six different conditions, with
each condition consisting of three trials (6 x 3). Each individual has
performed the complete test twice, which gives me 6 trials for each of the 6
conditions. I would like to determine at what point the individuals
performance becomes reliable (stable). Specifically I want to know how many
trials need to be performed in order to determine when the individual has
move beyond learning and into actual performance.

Specifically, my questions are:
(1) whether or not an ICC is the appropriate test to perform,

(2) if the ICC is appropriate do I need to calculate an ICC for each set of
two consecutive trials or for the entire group of 6 trials for each
condition of the six condition test, and

(3) Do I need to correct the alpha level to accommodate for the multiple
comparisons (.05/# of contrasts)?

Any help would be appreciated.

Clark Dickin
[EMAIL PROTECTED]







=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: SPC in Iron Casting Foundry

2001-07-24 Thread a2q

Recommend you decide first what is important for a specific
product/process.

Start with run charts - not to worry about full SPC.

See what that tells you about product consistency - I bet you will learn a
lot early on.

Then work toward more calculation intense charts.

Only a very few charts are needed.  If you pick subjects that people care
about.

Jay
On Tue, 24 Jul 2001, See Liang wrote:

 Date: Tue, 24 Jul 2001 11:15:06 GMT
 From: See Liang [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: SPC in Iron Casting Foundry

 Can anyone recommend books or websites where I can find information
 specific to SPC application in iron casting foundry?
 TIA.

 Best Regards,
 ONG See Liang
 e-mail : [EMAIL PROTECTED]
 Remove NO..SPAM from e-mail


 =
 Instructions for joining and leaving this list and remarks about
 the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
 =


-- 
Warner Consulting, Inc.
   A2Q - Approach to Quality
   Quality  Productivity Improvement that Works!
Melissa Warner, President
Jay Warner, Principal Scientist
Phone:  (414) 634-9100
FAX:(414) 681-1133
email:  [EMAIL PROTECTED]
Snail mail:
   North Green Bay Road
  Racine, WI 53404-1216
  USA



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



SRSes

2001-07-24 Thread Dennis Roberts

most books talk about inferential statistics ... particularly those where 
you take a sample ... find some statistic ... estimate some error term ... 
then build a CI or test some null hypothesis ...

error in these cases is always assumed to be based on taking AT LEAST a 
simple random sample ... or SRS as some books like to say ...

but, we KNOW that most samples are drawn in a way that is WORSE than SRS ...

thus, essentially every CI ... is too narrow ... or, every test statistic 
... t or F or whatever ... has a p value that is too LOW ...

what adjustment do we make for this basic problem?

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: SRSes

2001-07-24 Thread Donald Burrill

Hi, Dennis!
Yes, as you point out, most elementary textbooks treat only SRS 
types of samples.  But while (as you also point out) some more realistic 
sampling methods entail larger sampling variance than SRS, some of them 
have _smaller_ variance -- notably, stratified designs when the strata 
differ between themselves on the quantity being measured.

On Tue, 24 Jul 2001, Dennis Roberts wrote:

 most books talk about inferential statistics ... particularly those 
 where you take a sample ... find some statistic ... estimate some error 
 term ... then build a CI or test some null hypothesis ...
 
 error in these cases is always assumed to be based on taking AT LEAST a 
 simple random sample ... or SRS as some books like to say ...
 
 but, we KNOW that most samples are drawn in a way that is WORSE than SRS 

I don't think _I_ know this.  I know that SOME samples are so drawn;  
but (see above) I also know that SOME samples are drawn in a way that 
is BETTER than SRS (where I assume by worse you meant with larger 
sampling variance, so by better I mean with smaller sampling 
variance).

 thus, essentially every CI ... is too narrow ... or, every test 
 statistic ... t or F or whatever ... has a p value that is too LOW  
 
 what adjustment do we make for this basic problem?

I perceive the basic problem as the fact that sampling variance is 
(relatively) easily calculated for a SRS, while it is more difficult 
to calculate under almost _any_ other type of sampling.  
 Whether it is enough more difficult that one would REALLY like to avoid 
it in an elementary course is a judgement call;  but for the less 
quantitatively-oriented students with whom many of us have to deal, we 
_would_ often like to avoid those complications.  Certainly dealing with 
the completely _general_ case is beyond the scope of a first course, so 
it's just a matter of deciding how many, and which, specific types of 
cases one is willing to shoehorn into the semester (and what previews 
of coming attractions one wishes to allude to in higher-level courses). 

Seems to me the most sensible adjustment (and of a type we make at 
least implicitly in a lot of other areas too) is 
 = to acknowledge that the calculations for SRS are presented 
   (a) for a somewhat unrealistic ideal kind of case,
   (b) to give the neophyte _some_ experience in playing this game,
   (c) to see how the variance depends (apart from the sampling scheme)
on the sample size (and on the estimated value, if one is 
estimating proportions or percentages),
   (d) in despite of the fact that most real sampling is carried out 
under distinctly non-SRS conditions, and therefore entails 
variances for which SRS calculations may be quite awry;  and
 = to have yet another situation for which one can point out that for 
actually DOING anything like this one should first consult a 
competent statistician (or, perhaps, _become_ one!).

Some textbooks I have used (cf. Moore, Statistics:  Concepts  
Controversies (4th ed.), Table 1.1, page 40) present a table giving the 
margin of error for the Gallup poll sampling procedure, as a function of 
population percentage and sample size.  Such a table permits one to show 
how Gallup's precision varies from what one would calculate for a SRS, 
thus providing some small emphasis for the cautionary tale one wishes to 
convey.

 
 Donald F. Burrill [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110  603-471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Regression Modeling Strategies

2001-07-24 Thread Rich Ulrich

On Sat, 21 Jul 2001 12:08:38 -0400, [EMAIL PROTECTED] wrote:

 I am pleased (and relieved) to announce the publication of
 Regression Modeling Strategies, With Applications to Linear
 Models, Logistic Regression, and Survival Analysis
 (Springer, June 2001).

[ ... ] 
 More information may be obtained from
 http://hesweb1.med.virginia.edu/biostat/rms

I searched in Books in Print for Harrell and didn't
find it.  It is in there by  title -- as above -- but no author.  

Listed as ISBN 0-387-95232-2,  632 pages,  $79.95 .
I'm running right down to place my order.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: SRSes

2001-07-24 Thread Dennis Roberts

At 03:55 PM 7/24/01 -0400, Donald Burrill wrote:
Hi, Dennis!
 Yes, as you point out, most elementary textbooks treat only SRS
types of samples.  But while (as you also point out) some more realistic
sampling methods entail larger sampling variance than SRS, some of them
have _smaller_ variance -- notably, stratified designs when the strata
differ between themselves on the quantity being measured.

sure ... i know that

(then i said) ... but, we KNOW that most samples are drawn in a way that is 
WORSE than SRS



and you responded

I don't think _I_ know this.  I know that SOME samples are so drawn;
but (see above) I also know that SOME samples are drawn in a way that
is BETTER than SRS (where I assume by worse you meant with larger
sampling variance, so by better I mean with smaller sampling
variance).

i think we do know this ... if you enumerate all the situations you know of 
where sampling from some larger population has been done ... i would bet a 
dollar to a penny that ... the sampling plan is WORSE than SRS  ... so, i 
would suggest that the NORM is worse ... the exception is SRS or better

i don't think books spend nearly enough time ... on the fact that most day 
in day out samples are taken in a pretty pathetic way ...


I perceive the basic problem as the fact that sampling variance is
(relatively) easily calculated for a SRS, while it is more difficult
to calculate under almost _any_ other type of sampling.

sure ... but, books ONLY seem to discuss the easy way ... and i do too ... 
because it seems rather straight forward ... but, given time constraints 
... it never goes further than that ...

  Whether it is enough more difficult that one would REALLY like to avoid
it in an elementary course is a judgement call;  but for the less
quantitatively-oriented students with whom many of us have to deal, we
_would_ often like to avoid those complications.  Certainly dealing with
the completely _general_ case is beyond the scope of a first course, so
it's just a matter of deciding how many, and which, specific types of
cases one is willing to shoehorn into the semester (and what previews
of coming attractions one wishes to allude to in higher-level courses).

however, we do become sticklers for details ... and force students to use 
the correct CVs, make the right CIs, ... do the t tests correctly ... and 
heaven forbib if you get off a line or two when reading off the values from 
the t table ...


Seems to me the most sensible adjustment (and of a type we make at
least implicitly in a lot of other areas too) is
  = to acknowledge that the calculations for SRS are presented
(a) for a somewhat unrealistic ideal kind of case,

i would stress ... really unrealistic ...

(b) to give the neophyte _some_ experience in playing this game,

and then leave them hanging

Some textbooks I have used (cf. Moore, Statistics:  Concepts 
Controversies (4th ed.), Table 1.1, page 40) present a table giving the
margin of error for the Gallup poll sampling procedure, as a function of
population percentage and sample size.  Such a table permits one to show
how Gallup's precision varies from what one would calculate for a SRS,
thus providing some small emphasis for the cautionary tale one wishes to
convey.

but ... in moore and mccabe ... the stress throughout the book ... is on 
SRSes ... and no real mention is made nor solutions to ... the problems 
that it will be a rare day in analysis land ... for the typical person 
working with data ... to be doing SRS sampling ...
it's just not going to happen

the bottom line, IMHO, is that we glide over this like it is not a problem 
at all ... when we know it is


  
  Donald F. Burrill [EMAIL PROTECTED]
  184 Nashua Road, Bedford, NH 03110  603-471-7128

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: SRSes

2001-07-24 Thread Jerry Dallal

Dennis Roberts wrote:

 but, we KNOW that most samples are drawn in a way that is WORSE than SRS ...
 
 thus, essentially every CI ... is too narrow ... or, every test statistic
 ... t or F or whatever ... has a p value that is too LOW ...
 
 what adjustment do we make for this basic problem?

We do it anyway!  The real concern isn't that CIs are to narrow or
that Ps are too liberal, but that they are completely irrelevant. 
If it's impossible to specify a sampling model, there's no formal
basis for inference.  (I'm ignoring randomized trials, which can be
valid without being generalizable.)  

For better or worse, here's what I tell my students in an attempt at
honesty...

Sometimes the pedigree of a sample is uncertain, yet standard
statistical techniques for simple random samples are used. The
rationale behind such analyses is best expressed in a reworking of a
quotation from Stephen Fienberg, in which the phrases (contingency
table and multinomial have been replaced by survey and simple
random): 

 It is often true that data in a [survey] have not been
produced by a [simple random] sampling procedure, and that the
statistician is unable to determine the exact sampling scheme which
was used. In such situations the best we can do, usually, is to
assume a [simple random] situation and hope that it is not very
unreasonable.

This does not mean that sampling issues can be disregarded. Rather,
it says that in some instances we may treat data as though they
arose from a simple random sample, barring evidence that such an
approach is inappropriate.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



RE: SRSes

2001-07-24 Thread Simon, Steve, PhD

Dennis Roberts writes:

 most books talk about inferential statistics ... particularly those 
 where you take a sample ... find some statistic ... estimate some error 
 term ... then build a CI or test some null hypothesis ...
 
 error in these cases is always assumed to be based on taking AT LEAST a 
 simple random sample ... or SRS as some books like to say ...
 
 but, we KNOW that most samples are drawn in a way that is WORSE than SRS 

 thus, essentially every CI ... is too narrow ... or, every test 
 statistic ... t or F or whatever ... has a p value that is too LOW  
 
 what adjustment do we make for this basic problem?

Another thought provoking question from Penn State.

In the real world, most people assess the deviation from SRS in a
qualitative (non-quantitative) fashion. If the deviation is serious, then
you consider it as a more preliminary finding or one that is in greater need
of replication. If it is very serious, you totally disregard the findings
from the study. The folks in Evidence Based Medicine talk about levels of
evidence, and this is one of the things that they would use to select
whether a study represents a higher or lower level of evidence.

You probably do the same thing when you assess problems with non-response
bias, recall bias, and subjects who drop out in the middle of the study.
Typically you assess these in a qualitative fashion because it is so
difficult to quantify how much these will bias your findings.

You could argue that this represents the classic distinction between
sampling error and non-sampling error. The classic CI is almost always too
narrow, because it only accounts for some of the uncertainty in the model.
We are getting more sophisticated, but we still can't quantify many of the
additional sources of uncertainty.

By the way, if you take non-SRS sample and then randomly allocate these
patients to a treatment and control group, the CI appropriately accounts for
uncertainty within this population, but you have trouble extrapolating to
the population that you are more interested in. It's the classic internal
versus external validity argument.

I hope this makes sense and is helpful.

Steve Simon, [EMAIL PROTECTED], Standard Disclaimer.
STATS: STeve's Attempt to Teach Statistics. http://www.cmh.edu/stats
Watch for a change in servers. On or around June 2001, this page will
move to http://www.childrens-mercy.org/stats



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



RE: SRSes

2001-07-24 Thread dennis roberts

my hypothesis of course is that more often than not ... in data collection 
problems where sampling is involved AND inferences are desired ... we goof 
far more often ... than do a better than SRS job of sampling

1. i wonder if anyone has really taken a SRS of the literature ... maybe 
stratified by journals or disciplines ... and tried to see to what extent 
sampling in the investigations was done via SRS ... better than that ... or 
worse than that??? of course, i would expect even if this is done ... we 
would have a + biased figure ... since, the notion is that only the 
better/best of the submitted stuff gets published so, the figures for all 
stuff that is done (ie, the day in day out batch), published or not ... 
would have to look worse off ...

2. can worse than SRS ... be as MUCH worse ... as complex sampling plans 
can be better than SRS??? that is ... could a standard error for a bad 
sampling plan (if we could even estimate it) ... be proportionately as much 
LARGER than the standard error for SRS samples ... as complex sampling 
plans can produce standard errors that are as proportionately SMALLER than 
SRS samples? are there ANY data that exist on this matter?


==
dennis roberts, penn state university
educational psychology, 8148632401
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: SPC in Iron Casting Foundry

2001-07-24 Thread AaronA

http://www.afslibrary.com/

The site for the Library of the American Foundry Society.

--
Aaron Gesicki
Sparta, Wisconsin
Coulee Country - 40 km from the Mississippi
AAW - Northeastern Wisconsin  Coulee Region
Northeastern Wisconsin Woodworkers Guild




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=