Re: Degrees of Freedom

2000-04-28 Thread Donald F. Burrill

On Thu, 27 Apr 2000, GEORGE PERKINS wrote:

 I got a call the other day from a high school science teacher asking
 about the following: 
 
 She is testing different brands of yogurt for acid neutralization by
 acidophilus bacteria. 

O.K.  To start with we have some unspecified number  b  of grands 
of yogurt;  it follows that either we want to average them all together, 
so as to ignore any systematic differences that may exist between brands, 
or we want to keep them explicitly separate, so that we can detect (or at 
any rate attempt to detect) systematic differences between brands. 
 If b  2, already t-tests are to be discarded in favor of analysis of 
variance (ANOVA).

 Her students have measured the pH of yogurt then
 poured in a known amount of acid and began measuring pH in intervals of 
 1 minute for 5 minutes. She has six replicates for each of the types of
 yogurt for a total of 12 time series. 

If there are only 12 time series, then it appears  b = 2.  Yes?
Now the manipulation and measuring seem to have been carried out by some 
(also unspecified number of) students.  Are the six replicates associated 
with six students, each of whom carried out one replicate?  Or is the 
procedure followed rather messier than that?  And if the several students 
aren't equivalent to the replicates, in what precisely do the replicates 
consist?

 She wants to test if the mean concentration of acid is different in 
 the two groups by taking the initial pH value - final pH value for each 
 replicate getting a total of six differences per group then finds a 
 mean of differences for each set. 

Only initial vs. final?  What was the point of the 1-minute-apart 
administration of acid and measurement of pH, if one is going to ignore 
the time-series information altogether?

 Finally, she wants to take the means from each set of differences and 
 do a hypothesis test mu1=mu2 using a t-test but can't figure out the 
 degrees of freedom of the test and frankly I am not quite sure either. 

Why?  That is, why a t-test?  Because that's the only form of analysis 
she knows how to do?  The situation clearly calls for a repeated-measures 
ANOVA;  and I'd bet that if she actually does treat it as a t-test 
(comparing Brand B with Brand X, I'd guess?), which could be equivalent 
to the formal test of one of the main effects in the proper ANOVA, she 
won't correctly calculate the sampling variance of the two means.  If it 
be the case that she doesn't know how to do ANOVA, point her gently in 
the direction of Bruning  Kintz, Computational Handbook of Statistics,
which must be in a 4th or 5th edition by now.  Marvellous cookbook -- 
leads the naive (or for that matter not so naive) reader through the 
necessary arithmetic step by step [rather as though one were writing a 
computer program for the computer between one's ears] for a _wide_ 
variety of formal analyses, and supplies references for those who want to 
pursue the matter further.

 Her idea is to take 12-2 degrees but others have said it should be 6-1 
 degrees. I wonder if others out there can shed light on three issues: 

Well, let's see.  If I've sorted this out aright, she has six replicates 
(r = 6) of time-series measurements (t = 6) on each of two brands of 
yogurt (b = 2)  Looks like 72 measurements all together.  Presumably the 
six time points are conceptually or logically equivalent for all 12 time 
series, so replicates (R) are crossed with time (T), and they are 
necessarily nested within brand (B);  we have therefore a formal design 
of the form  R(B)xT  -- a repeated measures design. 
 The formal ANOVA table will have the following lines:

Source  df  Error term

Brand1R(B)
Replicates(Brand)   10---
Time 5TR(B)
Brand x Time 5TR(B)
Time x Repl (Brand) 50---

TOTAL   71

(Another name for "Error term" is "denominator mean square".)
If she decides to discard all the data in the time series except for the 
first and last measurements, then there's only 1 d.f. for Time, and only 
1 d.f. for the Time-by-Brand interaction, and 10 d.f. for TR(B).

 1) Is the t-test approach she is using on solid statistical footing, 
and if so how many degrees of freedom is to be used for the t-test? 

Well, _I'd_ use ANOVA myself.  Error d.f. for Brand are 10.

 2) If the t-test approach is not legitimate what type of statistical 
test can be used to test the mu1=mu2 hypothesis? (keep in mind that 
these are high school students)
 
Discussed at length above.  Get Bruning  Kintz.

 3) Is there a 'better' way to proceed with the analysis in the future 
for these types of experiments?
Yes.

 If you want to answer could you please forward the response to my 
 

Re: Is Bootstrapping Appropriate?

2000-04-28 Thread Greg Heath

Date: Wed, 26 APR 2000 22:05:01 -0400
From: Zubin [EMAIL PROTECTED]

 Yes, I believe your methodology will work.  However, you should sample a
 window of data rather than a single data point when you calculate your
 re-sampling statistics.  I am not sure on the window size, though.

1. Based on the 1 sec 1/e decorrelation time , or on another 
   characteristic time, T,  based on thresholding the autocorrelation at 
   another level, combine the values in the window (t-T/2,t+T/2) to create 
   a value associated with the random draw that picked sample x(t).

2. Treat this value as if it were a random draw of independent measurements?

If this is what you meant I don't think that defeats the correlation 
problem if the draws are performed with replacement. If the draws are 
performed without replacement, wouldn't you need to exclude the whole 
window of length T? ... I don't think that would yield many values before 
you ran out of allowable windows.

To defeat that problem maybe I should just subsample uniformly at, say one 
pulse every T seconds to obtain M ~ 20*T samples with N ~ 26/T independent 
measurements per sample. Then perform M tests. For T ~ 2 sec (i.e., 2 
decorrelation times), I get M ~ 40 tests with ~ 13 measurements per test. I 
then get M ~ 40 test probabilities to average.

For the purposes of argument, let's say the procedure in the above 
paragraph is acceptable. Then would a resampling based on randomly picking 
one measurement from each of the 13 windows be better? That way I would 
get  ~ 40^13 possible combinations instead of 40. 

Is the last paragraph what you were suggesting?

Thanks,

Greg

Gregory E. Heath [EMAIL PROTECTED]  The views expressed here are
M.I.T. Lincoln Lab   (781) 981-2815not necessarily shared by
Lexington, MA(781) 981-0908(FAX)   M.I.T./LL or its sponsors
02420-9185, USA

 Greg Heath [EMAIL PROTECTED] wrote in message
 Pine.SOL.3.91.1000426203238.20192C-10@miles">news:Pine.SOL.3.91.1000426203238.20192C-10@miles...
  Can you help or lead me to the appropriate reference?
 
  I have 526 radar measurements evenly sampled over 26.25 sec (i.e., pulse
  repetition frequency = 20 points per second).
 
  mean  =  0.0
  stdv  =  1.2
  t0=  1sec (1/e decorrelation time from the autocorrelation
 function)
 
  I want to test the null hypothesis that these correlated measurements
  could have been drawn from a zero-mean Gaussian distribution.
 
  However I don't believe I have enough independent measurements.
 
  Will bootstrapping help? i.e.,
 
  --- SNIP ---


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Is Bootstrapping Appropriate?

2000-04-28 Thread Greg Heath

Date: Fri, 28 APR 2000 00:00:45 GMT
From: [EMAIL PROTECTED]

  1. Randomly draw, with replacement, 526 measurements.
 
 You are only justfied in resampling in this way if you know that all
 your observations are iid. I didn't quite follow your problem but it
 sounds that the iid assumption is not justified. If your obs were iid
 then there would probably be much easier ways of testing your
 hypothesis than bootstrap.
 
 Could you please explain you problem again...How many obserations of
 the time series (1...T) do you have (i.e. what is T?) How many
 variables are in each observation of the time series(20?)? What are you
 testing about this time series?

One variable, 20 measurements per second, 26.25 seconds (526 measurements).
The 1/e decorrelation time estimated from the autocorrelation function is 
~ 1 second. Therefore, I will get independent measurements approximately 
every T0 seconds (probably ~2  = T0 = ~ 4 sec)

Could these correlated measurements have come from a Gaussian distribution?

Please see my responses to the other replies.

Greg

Hope this helps.

Gregory E. Heath [EMAIL PROTECTED]  The views expressed here are
M.I.T. Lincoln Lab   (781) 981-2815not necessarily shared by
Lexington, MA(781) 981-0908(FAX)   M.I.T./LL or its sponsors
02420-9185, USA


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Q: error on RMS, __please__ help.

2000-04-28 Thread Selim Issever

I am sorry for the confusion. English is not my native language and sometimes
I am not precise enough.

What I meant with the term error, was the statistical error of a measurement.
I am interessted in the statistical relevance of the measurement (confidence
interval that the measured value is correct with a probability of 68.3% =
probability that the measured value is 1 sigma around the real value).

And for sure, with 100 measurement I cannot measure the _real_ distribution
and thus not measure the real rms. But I can estimate the rms and then I
should give a number how good this estimation is.

Rich Ulrich wrote:
 
 On Thu, 27 Apr 2000 14:43:08 +0200, Selim Issever
 [EMAIL PROTECTED] wrote:
 
  Dear all,
 
  I measure a physical quantity about 100 times. I am not interessted in the
  mean value but the spread (the RMS) of this quantity. I can calculate the RMS
  easily, but I also need the error on the RMS. Could you give me a hint how to
  calculate the error on the rms?
 
 From your description, there is no reason to think that there has to
 be any "error" at all.
 
 You have a set of measures.  The are somewhat spread, for real,
 physical reasons.  The dispersion looks like gaussian, but it would
 not have to be that shape.  (How were the points selected?  Why were
 they selected?)  If you want to describe the spread of that set of
 measures by the RMS, you may do so -- though it might be more useful,
 it seems to me, to describe the extremes and the conditions that
 produced them.
 
 Why do you think there may be error in the measurements, and how would
 you detect it if there were?
 
 
  May be I should add, that the spread is not due to the measurement, but real.
  A good example would be a metal bar, which expands and shrinks due to
  stochastic temperature effects. The value I would be interessted in, is the
  _length_variation_ and an _error_estimation_ for this value.
 
  The distribution of the quantity I am looking at could be approximated by an
  gaussian (just in case it eases the discussion). At least it looks like a
  gaussian, when I histogram it.
 
 --
 Rich Ulrich, [EMAIL PROTECTED]
 http://www.pitt.edu/~wpilib/index.html

-- 
Selim Issever | Tel: 040 8998-2843+-  Du sollst nicht gleichzeitig
DESY-F15  | Fax: 040 8998-4033+-  trinken und backen.-
Notkestr. 85  | [EMAIL PROTECTED] +- A. Schwarzenegger/Der Cityhai
22603 Hamburg/Germany   |  http://www.physik.uni-dortmund.de/~issevers
 S M M + +: Your new mapping mud client @ http://smm.mudcenter.com 
---
This signature was automatically generated with Signify v1.04.
For this and other cool products, check out http://www.verisim.com/


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Question about kappa

2000-04-28 Thread David Cross/Psych Dept/TCU

I think I would consider using generalizability theory for this problem.
Shavelson and Webb have a good book out on the subject, published by Sage.

On Thu, 27 Apr 2000, Robert McGrath wrote:

 I am looking for a formula for kappa that applies for very special
 circumstances:
 
 1) Two raters rated each event, but the raters varied across event.
 2) The study involved 100 subjects, each of whom generated app. 17 events,
 so multiple events were generated by the same subject.
 
 I know Fleiss has developed a formula for kappa that allows for multiple
 sets of raters, but is there a formula that is appropriate for the
 circumstance I have described?  Thanks for your help!
 
 Bob
 
 -
 
 Robert McGrath, Ph.D.
 School of Psychology T110A
 Fairleigh Dickinson University, Teaneck NJ 07666
 voice: 201-692-2445   fax: 201-692-2304
 
 - Original Message -
 From: "Bob Wheeler" [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Thursday, April 27, 2000 3:15 PM
 Subject: Sample size and distributions programs
 
 
  I have uploaded two programs that some may find of
  use:
 
  (1) Tables. A Windows program written quite a few
  years ago. It treats 42 distributions extensively
  including plots and technical documentation.
  (2) SSize. A sample size program for the Palm
  devices. It treats linear models for several
  distributions: normal, binomial, Poisson, and
  chi-squared. ANOVA, t-tests, logistic, etc. There
  is a fairly extensive documentation in pdf format.
  This is a new program, so there are undoubtedly
  bugs.
  I would greatly appreciate hearing about them.
 
  They are at  http://www.bobwheeler.com/stat/
 
 
  --
  Bob Wheeler --- (Reply to: [EMAIL PROTECTED])
  ECHIP, Inc.
 
 
 
 ===
  This list is open to everyone.  Occasionally, less thoughtful
  people send inappropriate messages.  Please DO NOT COMPLAIN TO
  THE POSTMASTER about these messages because the postmaster has no
  way of controlling them, and excessive complaints will result in
  termination of the list.
 
  For information about this list, including information about the
  problem of inappropriate messages and information about how to
  unsubscribe, please see the web page at
  http://jse.stat.ncsu.edu/
 
 ===
 
 
 
 
 ===
 This list is open to everyone.  Occasionally, less thoughtful
 people send inappropriate messages.  Please DO NOT COMPLAIN TO
 THE POSTMASTER about these messages because the postmaster has no
 way of controlling them, and excessive complaints will result in
 termination of the list.
 
 For information about this list, including information about the
 problem of inappropriate messages and information about how to
 unsubscribe, please see the web page at
 http://jse.stat.ncsu.edu/
 ===
 



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Statistical Software

2000-04-28 Thread mattcfenn

I need to find a statistical software packages. Most of my statistical
work has been done using Microsoft Excel. This has worked out fine,
however, I need to find a more heavy duty package but nothing over
whelming. I perform some simple statistical work but would like to
begin to use a more powerful package. Any suggestion would be great.

Thnaks


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



RE: Data Mining blooper and Related Subjects

2000-04-28 Thread Silvert, Henry

I respectfully disagree with Michael Wyatt. I come from an academic
background and now work outside of academia, except for the occassional
course here or there. I too report to a manager or managers, depending on
the circumstances. But my experiences have not been the same as his. I am
constantly urged to use all my skills as a statistician and a research
methodologist by "my managers." (Horrid!!!) 

Henry M. Silvert PHD
Research Statistician
The Conference Board
845 3rd. Avenue
New York, NY 10022
Tel. No.: (212) 339-0438
Fax No.: (212) 836-3825

 -Original Message-
 From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]]
 Sent: Friday, April 28, 2000 7:52 AM
 To:   [EMAIL PROTECTED]
 Subject:  Re: Data Mining blooper and Related Subjects
 
 ...And it extends even further. Many of us who toil in areas outside of
 academia have our work and productivity "supervised" by managers or
 directors who have little or no training in statistics, beyond a survey
 course. They receive the flashy brochures and read the ads that promise
 analytical software that will provide significant information, without
 the bother of of formulating one of those fancy-shmancy hypotheses.
 
 The higher-ups come to view data mining, decision support, outcomes
 analysis,  etc. as requiring no more skill than the ability to use a PC.
  I call it "The Myth of the Statistical Meat Grinder".  The push of a
 button or two will generate the answer to all corporate questions, plus a
 few neat-o graphs for the board of directors packets.
 
 Michael T. Wyatt, Ph.D.
 (Embittered) Healthcare Analyst
 Quality Improvement Dept.
 DCH Regional Medical Center
 Tuscaloosa, AL
 
 
 
 On Wed, 26 Apr 2000 11:38:28 -0400 dennis roberts [EMAIL PROTECTED]
 writes:
  At 07:57 AM 4/26/00 -0500, Herman Rubin wrote:
  
  
  It does not surprise me one bit.  The typical statistics
  course teaches statistical methods and pronouncements, with
  no attempt to achieve understanding.   snip of more
  
  this is something i happen to agree with herman about ... but, it is 
  a much 
  broader problem than can be attributed to what happens in one course
  
  it is an attitude about what higher education is all about ... and 
  what the 
  goals are for it
  
  'going to college' ... be it undergraduate level or graduate level 
  ... has 
  become a much more hit and miss experience, residence has little 
  meaning 
  ... that is being tailored more and more to the convenience of 
  students ... 
  and to what is 'user' friendly (or it won't SELL). studying 
  principles in 
  disciplines is hard work ... NOT user friendly ... so, less and less 
  is 
  being required in the way of diligent study.
  
  take graduate school for example ... there was a time, was there not 
  ... 
  where doctoral students were REALLY expected to be responsible for 
  their 
  dissertations AND were expected to be the experts in that particular 
  area 
  of inquiry ... AND to be competent enough to have done the work 
  him/herself 
  ... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT
  
  but, what i have noticed over many years is that dissertations are 
  becoming 
  more of a committee effort ... yes, the student MAY have had the 
  idea 
  (though not necessarily) but, from there ... he/she gets help with 
  the 
  design ... has someone else do the analysis (because he/she did not 
  take 
  any/sufficient work in analytic methods to understand what is going 
  on) ... 
  gets help in writing and editing .. and, even gets help in terms of 
  what 
  their results MEAN ...
  
  gives new meaning to the term: "cooperative learning"
  
  
  
  
  
 
 =
 ==
  This list is open to everyone.  Occasionally, less thoughtful
  people send inappropriate messages.  Please DO NOT COMPLAIN TO
  THE POSTMASTER about these messages because the postmaster has no
  way of controlling them, and excessive complaints will result in
  termination of the list.
  
  For information about this list, including information about the
  problem of inappropriate messages and information about how to
  unsubscribe, please see the web page at
  http://jse.stat.ncsu.edu/
 
 =
 ==
 
 
 YOU'RE PAYING TOO MUCH FOR THE INTERNET!
 Juno now offers FREE Internet Access!
 Try it today - there's no risk!  For your FREE software, visit:
 http://dl.www.juno.com/get/tagj.
 
 
 ==
 =
 This list is open to everyone.  Occasionally, less thoughtful
 people send inappropriate messages.  Please DO NOT COMPLAIN TO
 THE POSTMASTER about these messages because the postmaster has no
 way of controlling them, and excessive complaints will result in
 termination of the list.
 
 For information about this list, including information about the
 

Re: Is Bootstrapping Appropriate?

2000-04-28 Thread Herman Rubin

In article Pine.SOL.3.91.1000428033622.20399C-10@miles,
Greg Heath  [EMAIL PROTECTED] wrote:
Date: Fri, 28 APR 2000 00:00:45 GMT
From: [EMAIL PROTECTED]

...

One variable, 20 measurements per second, 26.25 seconds (526 measurements).
The 1/e decorrelation time estimated from the autocorrelation function is 
~ 1 second. Therefore, I will get independent measurements approximately 
every T0 seconds (probably ~2  = T0 = ~ 4 sec)

Could these correlated measurements have come from a Gaussian distribution?

Please see my responses to the other replies.

Bootstrapping is totally inappropriate.  However, there
are other simpler simulation methods of obtaining the
significance level, using any test statistic you wish to
use, assuming you are willing to use the particular value
of the correlation coefficient and you are using a
scale-invariant test.  The variance will not affect your
test in this problem.  BTW, this method is the one used
for obtaining significance levels for the
Kolmogorov-Smirnov test when parameters are estimated.

Construct samples according to the null hypothesis.  The
samples should be independent; the dependence within each
sample should follow the model.  Then use the empirical
distribution to determine the significance of your data set.




-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



RE: Blackjack problem

2000-04-28 Thread Simon, Steve, PhD

Paul Bernhardt writes:

True, but card counters abound. Last month's (April, 00) Discover 
Magazine had an article on gambling and mentioned a newly developed card 
counting strategy that you don't need to be a genius to execute 
effectively. I have a buddy who has placed in a Vegas Blackjack 
tournament. He counts cards, using his foot position to keep track of the 
aces (very important, as they are needed for Blackjacks). There are no 
casino cameras to monitor your foot position (yet) so that you can get 
away with it.

I have a friend who is a professional black jack player, and from what I
understand it is a bit more complicated than that. You have to change your
betting behavior substantially when the deck is loaded with aces and face
cards. If you don't change your betting behavior on the basis of the card
count, how could you gain any advantage? You can walk away from a table when
the deck has very few aces and face cards, but that doesn't help you as much
as increasing your bets when the deck is in your favor.

It is this change in betting behavior that tips off the casinos.

I have a running joke here at the hospital about how we need to put some
money in the budget for these research grant proposals for some applied
probability research at Vegas.

Steve Simon, [EMAIL PROTECTED], Standard Disclaimer.
STATS - Steve's Attempt to Teach Statistics: http://www.cmh.edu/stats


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Blackjack problem

2000-04-28 Thread Rich Ulrich

On 27 Apr 2000 13:50:24 -0700, [EMAIL PROTECTED] (Donald F.
Burrill) wrote:
 [ ... ]
  (3)  It is true for Blackjack, unlike nearly all other Las Vegas-type 
 games, that a variable strategy on the part of the Player can change the 
 statistical advantage to the Player's side.  It should not surprise you 
 that only a certain type of variable strategy, among a very large number 
 of possible strategies, can have this effect;  and that Players showing 
 evidence of pursuing such a strategy very rapidly become persona non 
 grata at the gaming tables.  

 - make that, something about, Players
"SUCCESSFULLY pursuing such a strategy "
are the ones who are asked to leave.  (And
they remember your face, and pass around 
your picture.)

Card-counting brought a lot of new suckers into
the casinos, so I read.  Some of them couldn't pay 
enough attention in any case, some couldn't pay
enough attention once they had a few drinks, and
some kept trying even when the house implemented
multiple-decks and frequent shuffling.  

Also, in addition to pure odds, there is that aspect of
"going broke."  If a game is fair, remember, the winner
in the long run is the player with deeper pockets at the 
start.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



RE: Blackjack problem

2000-04-28 Thread Magill, Brett

Clip from earlier message...

"The Player may choose to play exactly the same rules
as the Dealer is REQUIRED to play; or the Player may choose some of the
other
options. Since the Player has more choices or options in play than does the
Dealer, why does the Dealer have the statistical advantage?  It seems to me
the
Player would have the advantage."



Doesn't the law of large numbers figure in here somewhere too:

1.  The probability of winning with the house strategy is known a priori and
it is optimal (as someone else pointed out).
2.  An individual playing with this same strategy may win or lose more or
less in the short run.
3.  With the volume of games the house plays, the empirical probability will
approach the a priori probability in the long run--to the house's advantage.

Simplistic and poorly articulated I am sure, but I think it captures the
essence of the mechanism at work here.


"The Player may choose to play exactly the same rules
as the Dealer is REQUIRED to play; or the Player may choose some of the
other
options. Since the Player has more choices or options in play than does the
Dealer, why does the Dealer have the statistical advantage?  It seems to me
the
Player would have the advantage."


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Blackjack problem

2000-04-28 Thread Robert Dawson

 "The Player may choose to play exactly the same rules
 as the Dealer is REQUIRED to play; or the Player may choose some of the
 other
 options. Since the Player has more choices or options in play than does
the
 Dealer, why does the Dealer have the statistical advantage?  It seems to
me
 the
 Player would have the advantage."
 


 Doesn't the law of large numbers figure in here somewhere too:

 1.  The probability of winning with the house strategy is known a priori
and
 it is optimal (as someone else pointed out).
 2.  An individual playing with this same strategy may win or lose more or
 less in the short run.
 3.  With the volume of games the house plays, the empirical probability
will
 approach the a priori probability in the long run--to the house's
advantage.

 Simplistic and poorly articulated I am sure, but I think it captures the
 essence of the mechanism at work here.

No. *If*, as originally suggested, the game were symmetric except for
certain possibly useful choices that player could make and the dealer could
not, the expected winnings of the player would be positive, and the expected
winnings of the house negative, on each individual game [assuming
intelligent play].  Now, E(sum(X_i)) = sum(E(X_i)) regardless of
distribution or even joint distribution. So the house would lose in the long
run *because* it lost in the short run, not despite that.

What is going on is that the rules of the game are not as was supposed.
Ties in which both hands are under 22 are [with some exceptions? help me!]
no-win-no-lose, but if both player and dealer bust, the house wins.

-Robert



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Process Capability / Specification Limits

2000-04-28 Thread Jeff Falk

Ed,
Was the spec written with an understanding of the measurement resolution?
Why not ask whoever wrote the spec?
I have been following numerous discussions through other sources about
design, gdt, and metrology. Miscommunication is a major problem.
Statistics won't help you decide what the person who wrote the spec meant.

Jeff Falk

[EMAIL PROTECTED] wrote:

 I'm doing a process capability study.  The spec. is 1.1 +/-.1  What is
 the argument against using limits of .95 and 1.24.  The idea is any
 measurement within this window would round within the actual spec.

 If the characteristic must be measured with a resolution of .05, does
 this change the argument?  Any help to settle this argument is greatly
 appreciated.

 Ed

 Sent via Deja.com http://www.deja.com/
 Before you buy.



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



no correlation assumption among X's in MLR

2000-04-28 Thread EAKIN MARK E


Besides independent normal errors with mean zero and  constant
variance, some (many?) econometric text books do make the assumption that
the independent variables are uncorrelated. For example see

Gujarti, Damodar (1988), _Basic Econometrics 2nd edition_, McGraw Hill, p.
166





Mark Eakin  
Associate Professor
Information Systems and Management Sciences Department
University of Texas at Arlington
[EMAIL PROTECTED] or
[EMAIL PROTECTED]



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper and Related Subjects

2000-04-28 Thread Debasmit Mohanty

I have been following the discussion on Data Mining blooper for a while. 
Being a first year graduate student in statistics, my comments on this issue 
might sound premature. Nevertheless, I would put forward my observations.

What I have learnt so far from my interaction with the statisticians in the 
academics as well as in the industry is the following:

1) Many of the statisticians still feel that "Data Mining" as a discipline 
should be left for the people in computer science.
Of course, I don't agree to this statement at all. If you read the paper 
"Data Mining and Statistics" by Dr. J. Friedman, you would realize how 
statisticians have neglected this emerging field over last few years.

2) There are few statistics graduate programs which emphasize on "Data 
Mining" research. Of course, there are few ones like Carnegie Mellon.
But overall, we are yet to give the much needed attention it needs.

I think, now is the time when we have to decide "Do we accept DATA MINING as 
a part of statistics or do we keep neglecting this field as before".

I am sure there would be few statistics students like me who feel that Data 
Mining is very much the part of statistics.

Thanks
Debasmit

--
Debasmit Mohanty
Graduate Student - Statistics
http://bama.ua.edu/~mohan001/
--


Date: Wed, 26 Apr 2000 11:38:28 -0400
From: dennis roberts [EMAIL PROTECTED]
Subject:

At 07:57 AM 4/26/00 -0500, Herman Rubin wrote:


It does not surprise me one bit.  The typical statistics
course teaches statistical methods and pronouncements, with
no attempt to achieve understanding.   snip of more

this is something i happen to agree with herman about ... but, it is a much
broader problem than can be attributed to what happens in one course

it is an attitude about what higher education is all about ... and what the
goals are for it

'going to college' ... be it undergraduate level or graduate level ... has
become a much more hit and miss experience, residence has little meaning
... that is being tailored more and more to the convenience of students ...
and to what is 'user' friendly (or it won't SELL). studying principles in
disciplines is hard work ... NOT user friendly ... so, less and less is
being required in the way of diligent study.

take graduate school for example ... there was a time, was there not ...
where doctoral students were REALLY expected to be responsible for their
dissertations AND were expected to be the experts in that particular area
of inquiry ... AND to be competent enough to have done the work him/herself
... and to UNDERSTAND it .. ie, BE ABLE TO DEFEND ALL OF IT

but, what i have noticed over many years is that dissertations are becoming
more of a committee effort ... yes, the student MAY have had the idea
(though not necessarily) but, from there ... he/she gets help with the
design ... has someone else do the analysis (because he/she did not take
any/sufficient work in analytic methods to understand what is going on) ...
gets help in writing and editing .. and, even gets help in terms of what
their results MEAN ...

gives new meaning to the term: "cooperative learning"

Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Unbalance Nested (Hierarchical) Design

2000-04-28 Thread Arvind Shah


I have an UNBALANCED nested (also called hierarchial) design with Factor A
being fixed and the Factor B (within A) random.  So my ANOVA has the line
entries (for source): A, B(A), Error (or within cell) and total. I am
looking for the expected mean squares and approaches for computing
confidence intervals on the mean for different levels of A. Any help or
reference will be highly appreciated. 

Arvind Shah
Univ of South Alabama





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: truncated Binomial

2000-04-28 Thread Herman Rubin

In article 8e7etv$msp$[EMAIL PROTECTED],  [EMAIL PROTECTED] wrote:
Hi,


Could anybody tell me how to write the density of the binomiale
distribution when x=0 is not observed? will the MLE of p different than
X-bar in the case of truncated Binomial? How about the variance and the
bias of this estimator?

The probability distribution will be the conditional 
distribution.  Y = X-bar is still a sufficient statistic,
but unless Y = n, it will not be the MLE.  In fact, if
Y = 1, the MLE is 0.

The MLE satisfies (1-q^n)*Y = np.  The asymptotic mean
and variance can be computed in the usual manner for
regular problems, but the actual mean and variance is
not simple.

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: no correlation assumption among X's in MLR

2000-04-28 Thread dennis roberts

At 11:09 AM 4/28/00 -0500, EAKIN MARK E wrote:

Besides independent normal errors with mean zero and  constant
variance, some (many?) econometric text books do make the assumption that
the independent variables are uncorrelated. For example see

Gujarti, Damodar (1988), _Basic Econometrics 2nd edition_, McGraw Hill, p.
166

first, this would only possibly apply in the inferential situation, using r 
to estimate rho ... but has nothing to do with the correlation between X 
and Y (r) in the data set at hand and what assumptions are made about the 
correlation coefficient ...

and secondly, independent variables are either correlated with each other 
(non 0) or not ... thus, only for some specific application ... such as ... 
how can we maximize the multiple R between a set of predictors AND a 
criterion ... would such a statement make sense ... and there it is not 
even an assumption ... just a limiting case for R

sure, for some specific econometric model based on some theory ... one 
might assume this to SIMPLIFY THE MODEL but ... that has nothing to do with 
independent variables per se



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Statistical Software

2000-04-28 Thread Donald F. Burrill

On Fri, 28 Apr 2000 [EMAIL PROTECTED] wrote:

 I need to find a statistical software packages. Most of my statistical
 work has been done using Microsoft Excel. This has worked out fine,
 however, I need to find a more heavy duty package but nothing over
 whelming. I perform some simple statistical work but would like to
 begin to use a more powerful package. Any suggestion would be great.

I like Minitab, myself.  One virtue may be that it behaves in some ways 
like a spreadsheet, and the data are stored ( displayed, if desired) in 
what Minitab calls a "worksheet", which looks very much like the database 
display of a spreadsheet package.  Its command language is 
straightforward and easy to learn, although these days Minitab Inc. seems 
to be downplaying that particular advantage in favor of menu-driven 
controls.  If you are a student, I believe there is a special deal 
available from Minitab;  perhaps one of my colleagues whose knowledge is 
more immediate than mine will care to comment.
-- DFB.
 
 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 603-535-2597
 184 Nashua Road, Bedford, NH 03110  603-471-7128  



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining blooper and Related Subjects (fwd)

2000-04-28 Thread Bob Hayden

- Forwarded message from Debasmit Mohanty -

I think, now is the time when we have to decide "Do we accept DATA MINING as 
a part of statistics or do we keep neglecting this field as before".

I am sure there would be few statistics students like me who feel that Data 
Mining is very much the part of statistics.

- End of forwarded message from Debasmit Mohanty -

It may be a disagreement over words.  Much of the work Tukey et
al. did in the 60s, called exploratory data analysis, had to do with
looking at data and trying to detect patterns.  However, if you sift
through data you will find many "patterns" that are just flukes of
chance.  How do you avoid taking these seriously?  This was a
criticism directed at Tukey then, and even more so at what goes on
today under the name of "Data Mining".  But I have a sense that Tukey
had a much deeper awareness of the underlying statitical issues than
most of the miners have!-)
 

  _
 | |  Robert W. Hayden
 | |  Department of Mathematics
/  |  Plymouth State College MSC#29
   |   |  Plymouth, New Hampshire 03264  USA
   | * |  Rural Route 1, Box 10
  /|  Ashland, NH 03217-9702
 | )  (603) 968-9914 (home)
 L_/  [EMAIL PROTECTED]
  fax (603) 535-2943 (work)


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: no correlation assumption among X's in MLR

2000-04-28 Thread Donald F. Burrill

On Fri, 28 Apr 2000, EAKIN MARK E wrote:

 Besides independent normal errors with mean zero and constant
 variance, some (many?) econometric text books do make the assumption 
 that the independent variables are uncorrelated.  For example see
 
 Gujarti, Damodar (1988), _Basic Econometrics 2nd edition_, McGraw Hill, 
 p. 166

One is always at liberty to make additional assumptions, especially if 
there are some useful purposes to be served thereby.  The assumption that 
predictors are uncorrelated would be such an additional assumption.  It 
is not necessary for any known purpose in MLR qua MLR;  it may be 
necessary (though frankly I can't think why, but then I'm not an 
econometriciean), or perhaps useful, in some econometric models.

I _am_ curious, though:  If one is in the midst of a real-world problem 
of the kind that Professor Gujarti would wish to address, and the real 
predictors one has ARE correlated, what does one do?  Throw up one's 
hands in despair and wail, "It can't be done!" ?
-- DFB.
 
 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 603-535-2597
 184 Nashua Road, Bedford, NH 03110  603-471-7128  



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Unbalance Nested (Hierarchical) Design

2000-04-28 Thread Donald F. Burrill

On Fri, 28 Apr 2000, Arvind Shah wrote:

 I have an UNBALANCED nested (also called hierarchial) design with 
 Factor A being fixed and the Factor B (within A) random.  So my ANOVA 
 has the line entries (for source): A, B(A), Error (or within cell) and 
 total.  I am looking for the expected mean squares and approaches for 
 computing confidence intervals on the mean for different levels of A. 
 Any help or reference will be highly appreciated. 

When you write "unbalanced", do you mean only that the number of cases 
within each cell is not equal in all cells;  or do you mean the more 
serious problem that the number of levels of Factor B differs between 
levels of A?
If the former, perhaps the simplest approach would be an 
unweighted means analysis (which really means "equally weighted", not 
"UNweighted"!), for which the expected mean squares would be pretty much 
what they'd be for a balanced design (especially if the unbalancing is 
not really severe).  Confidence intervals on the means for different 
levels of A might want to vary according to the number of cases in each 
level;  confidence intervals on the _differences_ between means would be 
more difficult.
Alternatively, cast the entire problem into multiple regression 
format, using indicator variables of one kind or another to represent the 
several levels of A and of B.
-- DFB.
 
 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264 603-535-2597
 184 Nashua Road, Bedford, NH 03110  603-471-7128  



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Statistical Software

2000-04-28 Thread dennis roberts

see http://www.e-academy.com ... for lots of software ... including minitab
at 'rental' prices ... 

At 02:04 PM 4/28/00 -0400, Donald F. Burrill wrote:
On Fri, 28 Apr 2000 [EMAIL PROTECTED] wrote:

 I need to find a statistical software packages. Most of my statistical
 work has been done using Microsoft Excel. This has worked out fine,
 however, I need to find a more heavy duty package but nothing over
 whelming. I perform some simple statistical work but would like to
 begin to use a more powerful package. Any suggestion would be great.

I like Minitab, myself. 

footnote to don's comment re: command language ... not only does minitab
downplay it ... and have been for several releases now ... they almost
don't even acknowlege that it exists ... it does! 
==
dennis roberts, penn state university
educational psychology, 8148632401
http://roberts.ed.psu.edu/users/droberts/droberts.htm


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Grad Student needs guidance

2000-04-28 Thread Charles D Madewell

I am a graduate student in an engineering program which emphasizes
statistical methods for process improvement and/or product development.
I have found that I love applying statistical methods for
process/product development and testing.  I would not even mind a
company that is developing software applying statistics to
process/product development.  What types of jobs could I actually apply
these concepts?  In other words what should I be searching for as
far as job titles?  Your help would be greatly appreciated.

--
Charles Madewell
Implementation of Technology, Process/Product Development;
Statistical Design and Analysis of Experiments,
Regression Analysis  Model Building.


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



What is the logarithmic distribution? (many questions)

2000-04-28 Thread Vincent Vinh-Hung

General question,
I've seen two descriptions of "logarithmic distribution".
One is related to the frequency of digits called Benford's law (digit 1
occurs more frequently than 2, 2 than 3, etc) whose explanation is that
it is the result of a mixture of distributions.
The other description is a 2-page paragraph The logarithmic distribution
in Kendall and Stuart (1977, The Advanced theory of statistics, Vol 1,
4th edition, pp 139-140), attributing the derivation to Fisher (1943).
Are these concepts of logarithmic distribution the same or not?

Second question I would like to ask: Kendall and Stuart give an
example of a distribution of the logarithmic type from Fisher (1943),
"distribution of butterflies in Malaya, with theoretical frequencies
given by the logarithmic distribution"
No. of species  Theoretical frequency   Observed frequency
1   135.05  118
2   67.33   74
3   44.75   44
4   33.46   24
5   26.69   29
6   22.17   22
7   18.95   20
etc ...
From what I've understood, the Theoretical frequency was generated
by
  - ( q^r ) / ( r * ln(1-q) )
in which r is the No. of species, q is the probability of the presence
of an attribute.
How was, how can the fit be realized?

With thanks in advance,
Vincent Vinh-Hung


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Question about kappa

2000-04-28 Thread Rich Ulrich

On 27 Apr 2000 13:24:01 -0700, [EMAIL PROTECTED] (Robert McGrath)
wrote:

 I am looking for a formula for kappa that applies for very special
 circumstances:
 
 1) Two raters rated each event, but the raters varied across event.
 2) The study involved 100 subjects, each of whom generated app. 17 events,
 so multiple events were generated by the same subject.
 
 I know Fleiss has developed a formula for kappa that allows for multiple
 sets of raters, but is there a formula that is appropriate for the
 circumstance I have described?  Thanks for your help!

I think it was Fleiss who stated that for complex situations, the
kappa is usually equal to the Intraclass correlation (ICC), to the
first two decimal places.  So all you need to do, is this:   Define
the appropriate ANOVA table, and decide on the appropriate version of
the ICC.

My stats-FAQ has a reference on ICC for an unbalanced design.  It
entails approximations, so I hope the design is not *too* unbalanced.

 snip, McGrath sig.
 snip, Bob Wheeler post; included for no imaginable reason.  
 snip, quoting of Edstat-L message from the bottom of Bob Wheeler's
post 
 snip, Edstat-L message 

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Is Bootstrapping Appropriate?

2000-04-28 Thread Greg Heath

Date: Fri, 28 APR 2000 16:04:34 -0400
From: Rich Ulrich [EMAIL PROTECTED]

 On Fri, 28 Apr 2000 03:31:45 -0400, Greg Heath
 [EMAIL PROTECTED] wrote:
 
   snip, various  
  My simulation currently assumes that the residuals are Gaussian. If 
  this is a bad assumption, I need to know ASAP to prevent higher level 
  decision makers from making some very costly mistakes.
   ...  
 
 Herman Rubin suggested doing simulations, and that seems like a good
 approach.  How much sensitivity is there, to assumptions?  What sort
 of things will change the outcomes, by how much?
 
 Can you reproduce your present data, with your favored assumptions?
 Can you reproduce your present data, with dangerous assumptions?

Good questions. Currently, answers unknown. 

Will respond when I have them.

Thank you.

Greg

Hope this helps.

Gregory E. Heath [EMAIL PROTECTED]  The views expressed here are
M.I.T. Lincoln Lab   (781) 981-2815not necessarily shared by
Lexington, MA(781) 981-0908(FAX)   M.I.T./LL or its sponsors
02420-9185, USA



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Is Bootstrapping Appropriate?

2000-04-28 Thread Greg Heath

From: Herman Rubin [EMAIL PROTECTED]
Newsgroups: sci.stat.consult, sci.stat.edu, sci.stat.math

 In article Pine.SOL.3.91.1000428033622.20399C-10@miles,
 Greg Heath  [EMAIL PROTECTED] wrote:
 Date: Fri, 28 APR 2000 00:00:45 GMT
 From: [EMAIL PROTECTED]
 
   ...
 
 One variable, 20 measurements per second, 26.25 seconds (526 measurements).
 The 1/e decorrelation time estimated from the autocorrelation function is 
 ~ 1 second. Therefore, I will get independent measurements approximately 
 every T0 seconds (probably ~2  = T0 = ~ 4 sec)
 
 Could these correlated measurements have come from a Gaussian distribution?
 
 Please see my responses to the other replies.
 
 Bootstrapping is totally inappropriate.  However, there
 are other simpler simulation methods of obtaining the
 significance level, using any test statistic you wish to
 use, assuming you are willing to use the particular value
 of the correlation coefficient and you are using a
 scale-invariant test.  The variance will not affect your
 test in this problem.  BTW, this method is the one used
 for obtaining significance levels for the
 Kolmogorov-Smirnov test when parameters are estimated.
 
 Construct samples according to the null hypothesis.  The
 samples should be independent; the dependence within each
 sample should follow the model.  Then use the empirical
 distribution to determine the significance of your data set.

Sounds good. Thank you.

However, I'm surprised that simulation is still necessary if the 
measurements were independent instead of correlated.

Warren Sarle (private communication) commented that decorrelating the 
series using ARIMA and testing the residuals is also a valid approach.
However, since the variance is estimated I'd still have to use the 
simulation approach to obtain the significance level. Is that correct?

Greg

Hope this helps.

Gregory E. Heath [EMAIL PROTECTED]  The views expressed here are
M.I.T. Lincoln Lab   (781) 981-2815not necessarily shared by
Lexington, MA(781) 981-0908(FAX)   M.I.T./LL or its sponsors
02420-9185, USA


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Grad Student needs guidance

2000-04-28 Thread Charles Madewell



I am a graduate student in an engineering program which emphasizes
statistical methods for process improvement and/or product
development. I have found that I love applying statistical methods for
process/product development and testing.  I would not even mind a
company that is developing software applying statistics to
process/product development.  What types of jobs could I actually
apply these concepts?  In other words what should I be searching for
as far as job titles?  Your help would be greatly appreciated.


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Hypothesis

2000-04-28 Thread David A. Heiser








The EDSTAT traffic after the initial submission by Dennis Roberts on
4/7/2000 interested me. A lot of good thoughts on teaching a fundamental
concept.



His proposal resulted in a total of 117 messages up to 4/27/2000. This
may be a record on comments to a single theme. It struck a cord with 25
separate individuals. Here is my tally.



NameĀ 
Number of messages


 
  
  Dennis
  
  
  Roberts
  
  
  23
  
 
 
  
  Robert
  
  
  Dawson
  
  
  18
  
 
 
  
  Herman
  
  
  Rubin
  
  
  16
  
 
 
  
  Michael
  
  
  Granaas
  
  
  13
  
 
 
  
  Alan
  
  
  McLean
  
  
  7
  
 
 
  
  Bruce
  
  
  Weaver
  
  
  5
  
 
 
  
  Alan
  
  
  Hutson
  
  
  4
  
 
 
  
  David
  
  
  Heiser
  
  
  4
  
 
 
  
  Donald
  
  
  Burril
  
  
  4
  
 
 
  
  Rich
  
  
  Ulrich
  
  
  4
  
 
 
  
  Henry
  
  
  Silvert
  
  
  3
  
 
 
  
  Jon
  
  
  Cryer
  
  
  2
  
 
 
  
  Thom
  
  
  Baguley
  
  
  2
  
 
 
  
  Art
  
  
  Kendall
  
  
  1
  
 
 
  
  Bill
  
  
  Knight
  
  
  1
  
 
 
  
  Chris
  
  
  Mecklin
  
  
  1
  
 
 
  
  I
  
  
  Williams
  
  
  1
  
 
 
  
  Jerrold
  
  
  Zar
  
  
  1
  
 
 
  
  Jerry
  
  
  Dallal
  
  
  1
  
 
 
  
  Joe
  
  
  Ward
  
  
  1
  
 
 
  
  Juha
  
  
  Puranen
  
  
  1
  
 
 
  
  Magil
  
  
  Brett
  
  
  1
  
 
 
  
  Milo
  
  
  Schield
  
  
  1
  
 
 
  
  Richard
  
  
  Barton
  
  
  1
  
 
 
  
  Robert
  
  
  McGrath
  
  
  1
  
 




The count of replies up to the first single reply follows a Pareto
distribution reasonably well. For some this may be the first time encounter
with the Pareto distribution, since it is rarely discussed in stat textbooks.



DAHeiser