Re: Cumulative Frequency Polygons a right way?

2000-05-22 Thread steve_humphry



> Steve,
>  Your interpretation is right because the coordinates of the ogive
(graph of
> cumulative frequency/ relative cumulative frequency))  indicate "
less than
> the upper limit".
> Jin


Thanks kindly Jin, that's what I think has to be the case.

Steve.


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Cumulative Frequency Polygons a right way?

2000-05-22 Thread steve_humphry

Steve) Thanks very much for your response.

One might inquire, if one were pursuing this matter in a little more
depth, why one would not prefer a continuous approximating distribution
(e.g., normal, if that be appropriate, as is often the case), on the
basis either that the empirical CFs at hand represent an instance drawn
from such an idealized population, or that the continous function is an
adequate approximation to the true population distribution; since the
purpose you describe clearly is to apply the CF information to some
(hypothetical?) set of students whose scores are not in fact
represented in the data in hand.

Steve) Yeah, a normal approximation might be a good idea.  Our data are
typically close to being normally distributed, though since Rasch
measurement is used no assumptions of normality are made (and given
this, I don’t know how the suggestion would go down, but still….).  Is
it about (hypothetical) students not represented in the data?  Well,
yes and no.  Not literally, but in essence, yes – we want to make an
interpolation so as to more closely approximate how many
students 'might have actually' scored below a relatively more precise
score point than our test provides for.  For example, we may have 210
students with an ability (logit) of –1.32, 330 students at –0.81, and
wish to closer approximate how many students would hypothetically score
below a value of, say, –1.01.   See, the percentages are reported
publicly from year to year, and large fluctuations may cause a stir!
We're in essence trying to anticipate what may happen with a different
test but the same 'cut score' on the same scale, in future years
(assuming the ability distribution stays fairly stable).  Of course,
the only proper way to do this is to measure more accurately (not an
option), but obviously I’m after the best way to approximate given what
we have.

(Of course, the problem you describe below still arises, in terms of
how one converts from the discrete empirical CF function to the
(idealized?) continuous function; this is much less a problem if the
continuous function is obtained from information other than the CFs
themselves -- e.g., an approximating normal distribution would be
derived from the empirical mean and standard deviation, not from the
empirical CFs.)

Steve) I can only see us doing this if the normal (or other
distribution) is a close approximation at all, or at least most, points
along the scale.  But thanks, this is well worth exploring.

If by "cumulative frequency" ("CF" above) you mean "observed frequency
of responses less than or equal to this score value", and especially if
these CFs have been cumulated over a grouped empirical frequency
distribution, your logic is impeccable. If you've been cumulating at
the level of individual score values, there may be room for SOME
quibbling.

Steve) No, I mean < , though I don’t see that it makes a great deal of
difference for interpolation given that we may be talking about any
point up to a couple of decimal places on a scale of range about –5 to
+5.

First, make sure you're all on the same wavelength. You clearly are
thinking in terms of "<=" CFs; plotting at the lower limit would be
appropriate for "strictly <" CFs (or equivalently ">=" CFs). Plotting
at the midpoint would be reasonable if one took for one's CF the
midpoint between a "strictly <" CF and a "<=" CF. If upon examination
it turns out that your colleagues (?) really think they're dealing
with "<<=" CFs:

Steve) I’m not sure I explained in sufficient detail.  We want to make
interpolations, potentially at any point on the continuum (to a couple
of decimals).  Nonetheless, this is something that needs to be
explicitly clarified, you’re right.  It hasn’t been to date, so far as
I’m aware (I’ve assumed everyone means ‘percentage below the score’).

You might ask them how they view the two intervals at the extreme ends
of the CFs. In terms of relative cumulative percents (C%s), what scores
then apply to the upper and lower limits of (1) the lowest non-empty
score interval; (2) the highest score interval? And in particular, what
C% applies to the upper limit of the highest interval? Either of the
two alternatives you report implies a C% > 100% here, which ought to be
absurd enough for anyone with a decent grasp of reality.

Steve) That’s it!  Perfect way to make the point, I think.  I did think
of that some time ago, but I must admit it has slipped my mind since.
A reductio ad absurdum should hit the spot!  Thanks again.

Another approach is to inquire how one would arrange a CF
downward -- i.e., where the C%s range from 0 at the maximum value to
100% at the minimum, and the CFs represent the frequency of responses
greater than or equal to this score value.

Steve) Yes, I’ve raised this.

As for references, well the logic is all that concerns me, I can assure
you.  However, that done, anything else to make the case would be
good.  I've consulted with a couple of texts already, and they
recommen

Re: obsolete methods?

2000-05-22 Thread steve_humphry



>
> first, items don't have intensity ... people do in response TO an
item ...

An item's intensity is defined in terms of the response it elicits in
persons.  I'm using Thurstone's terminology.  If you're not happy with
that shorthand, fair enough! :-)

> second, just because (to use an analogy) someone scores (say on a 30
item
> test) high on the test does not mean that they got all items right
nor,
> would we expect them to ... so, just because someone has a fairly
strong +
> feeling towards a bank does not mean that they agree with (nor would
we
> expect them to) all the practices of the bank ...

Who said anything about 'all practices in the bank'?  This is probably
not a good way to elicit responses indicative of 'level of
satisfaction'.  In principle though, someone higher on satisfaction (X)
than another (Y) should tend to agree with most or all of the
statements that Y agrees with, then some more.  If not, you do not have
a basis for obtaining measurements (unless you're using an unfolding
structure).


scale scores (not even
> from a rasch developed scale) are not a true guttman scale

Steve) It is fairly simple to show that Rasch is a probabilistic
Guttman scale -- that is, the patterns of scoring corresponding with
the Guttman structure are most probable under the Rasch model, and
patterns close to the structure are more probable than ones removed
from that structure.

> certainly though ... one does not need the rasch model to detect
these
> tendencies ...

I agree.

Steve.


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: LISREL and Confirmatory FA

2000-05-22 Thread T.S. Lim

In article <8gcm45$i6e$[EMAIL PROTECTED]>, [EMAIL PROTECTED] 
says...
>
>Try the free student version of AMOS for structural equation
>modeling
>http://www.smallwaters.com/amos/student.html
>
>AMOS does factor analysis, path analysis
>and includes online documentation.


There's also the free Mx package that does structural equation 
modeling. The link is at

   http://www.kdcentral.com


>In article <8fjhn0$8rd$[EMAIL PROTECTED]>,
>  "Buoy" <[EMAIL PROTECTED]> wrote:
>> Hello to all
>>
>> I'm a Sociology student in Warsaw University finishing my 6th
>semester.
>> During last year I participated in a course of quantitative
>methodology.
>> While analyzing survey data from Polish General Social Survey - PGSS
>(which
>> was conuducted from 1992, 93, 94, 95, 96, 97, 99, further 
information
>in the
>> website of Institute for Social Studies: 
http://andante.iss.uw.edu.pl)
>the
>> group was learning about various methods of hypothesis testing using
>Jacques
>> Tacq "Multivariate Analysis in Social Science Resaerch". I noticed 
the
>> substantial lack of literature in the subject on Polish book market.
>Despite
>> of the small set of SAGE publications in the Institutes library 
there
>are no
>> books about methods of data analysis .
>>
>> Lately I had to perform a confirmatory factor analysis. I downloaded 
a
>free
>> version of Joreskog LISREL form the SSI site. Unfortunately there 
was
>no
>> posibility of downloading manual for free. The prices were also out 
of
>my
>> financial reach. I was wondering if anyone of You could give me any
>location
>> of free tutorials and texts on confirmatory factor analysis and
>structural
>> equation modeling which are available on the Internet. I'm also
>intersted in
>> some brief examples of thos procedures.
>>
>> Thank you in advance for any help
>>
>> Michal Bojanowski
>>
>>
>
>--
>Eugene D. Gallagher
>ECOS, UMASS/Boston

-- 
T.S. Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com
__
Get paid to write a review! http://recursive-partitioning.epinions.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: LISREL and Confirmatory FA

2000-05-22 Thread Gene Gallagher

Try the free student version of AMOS for structural equation
modeling
http://www.smallwaters.com/amos/student.html

AMOS does factor analysis, path analysis
and includes online documentation.

In article <8fjhn0$8rd$[EMAIL PROTECTED]>,
  "Buoy" <[EMAIL PROTECTED]> wrote:
> Hello to all
>
> I'm a Sociology student in Warsaw University finishing my 6th
semester.
> During last year I participated in a course of quantitative
methodology.
> While analyzing survey data from Polish General Social Survey - PGSS
(which
> was conuducted from 1992, 93, 94, 95, 96, 97, 99, further information
in the
> website of Institute for Social Studies: http://andante.iss.uw.edu.pl)
the
> group was learning about various methods of hypothesis testing using
Jacques
> Tacq "Multivariate Analysis in Social Science Resaerch". I noticed the
> substantial lack of literature in the subject on Polish book market.
Despite
> of the small set of SAGE publications in the Institutes library there
are no
> books about methods of data analysis .
>
> Lately I had to perform a confirmatory factor analysis. I downloaded a
free
> version of Joreskog LISREL form the SSI site. Unfortunately there was
no
> posibility of downloading manual for free. The prices were also out of
my
> financial reach. I was wondering if anyone of You could give me any
location
> of free tutorials and texts on confirmatory factor analysis and
structural
> equation modeling which are available on the Internet. I'm also
intersted in
> some brief examples of thos procedures.
>
> Thank you in advance for any help
>
> Michal Bojanowski
>
>

--
Eugene D. Gallagher
ECOS, UMASS/Boston


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Square root transformation

2000-05-22 Thread Herman Rubin

In article <[EMAIL PROTECTED]>, G. Anthony Reina <[EMAIL PROTECTED]> wrote:
>We use multiple linear regression to perform our analyses. Because we
>work with binned data (discharge frequency of a neuron) which follow a
>non-normal (Poisson) distribution, we typically use the square root
>transform on the dependent variable (discharge rate of the neuron).
>(Actually, the transformation is sqrt(spike rate + 3/8) )

Is there enough independence that the counts should be Poisson?

If so, the square root transformation does stabilize the
variance, but it introduces a bias.  In addition, any
non-linear transformation destroys the linearity of the
model.

The most important criterion for a regression or similar
procedure is the form of the model; for a linear regression,
with any number of independent variables, the linearity is
most important.  You COULD run a non-linear regression,
using the square root of a linear combination of independent
variables, or you could use a Poisson model and maximum
likelihood, or others.

>I've been trying to show that some independent variables account for
>more of the variance explained in the dependent variable. However, some
>researchers in my field argue that the square root transform could
>artificially bias my results so that some independent variable account
>for more of the variance than they really should. I don't see how this
>could be from a theoretical level. Plus, I've run the multiple
>regression without the transform and seen only about a 5% difference
>(not much).

It certainly can.  If one variable is more important at the low 
end, and another at the high end, this will happen.

>Does anybody know if these criticisms have any theoretical merit? I
>can't see how this can be so. I thought that the square-root transform
>was a pretty sound way of reducing your chance of biasing the analysis
>if the data is non-normal (which most parametric tests require).

Your tests are only approximate, anyhow.  The most important 
thing is the form of the model; use your theoretical knowledge
to decide which ones to use.  It usually does not matter how
good the tests are if the model is not accurate, and whatever
null hypothesis you test is going to be false, anyhow.

It is up to you to decide the meaning of the form of the model,
without regard to statistical testing.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Cumulative Frequency Polygons a right way?

2000-05-22 Thread David A. Heiser


- Original Message -
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, May 22, 2000 1:15 AM
Subject: Cumulative Frequency Polygons a right way?


>
>
> Hi all,
>
> First up, the purpose I have at hand is to make interpolations for
> percentages of students who have achieved above a certain score on a
> test (where this score may lie between two discrete score points on the
> scale)
_
ETC (see his original)

I have always assumed that the Kaplan-Meier estimator is the accepted
plotting method. You will find this in one of your stat books or in some
text on failure-time analysis.

For large bin sets there is very little difference between the three
positions you give.

DAHeiser




> It seems to me cumulative frequencies should be plotted at the exact
> upper limit of each interval.  This is the only simple method that
> makes sense to me.
>
> However, it has been suggested by others in the context I'm dealing
> with that frequencies/percentages can alternatively be plotted at the
> mid-point of each interval, or even at the lower limit!  Although I can
> understand plotting graphs at the mid-point for ease of representation,
> this hardly seems suited to making interpolations.  This is because
> when you read off the graph at the upper limit of a given interval, you
> will (probably) have more cases than fell up to and including the
> interval itself.  This is surely absurd, yet people seem to seriously
> believe it is a viable alternative.
>
> I'm really hoping for a good reference on this (preferably by a highly
> regarded author to make the case stronger :).  Any comments?  Any nice
> references?
>
> Thanks!
>
> Steve.
>
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.
>
>
>
===
> This list is open to everyone.  Occasionally, less thoughtful
> people send inappropriate messages.  Please DO NOT COMPLAIN TO
> THE POSTMASTER about these messages because the postmaster has no
> way of controlling them, and excessive complaints will result in
> termination of the list.
>
> For information about this list, including information about the
> problem of inappropriate messages and information about how to
> unsubscribe, please see the web page at
> http://jse.stat.ncsu.edu/
>
===
>



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



bad graphs

2000-05-22 Thread dennis roberts

i spotted this ...

http://www.sa.psu.edu/sara/pulse/bookstore.html

about 1/2 way down the page ... see the graph titled

"Penn State Bookstore Support for Activities" ...

should these dots be connected?
Dennis Roberts, EdPsy, Penn State University
208 Cedar Bldg., University Park PA 16802
Email: [EMAIL PROTECTED], AC 814-863-2401, FAX 814-863-1002
WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm
FRAMES: http://roberts.ed.psu.edu/users/droberts/drframe.htm



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Least squares Was: Re: what is s.d.?

2000-05-22 Thread dennis roberts

At 12:41 PM 5/22/00 -0500, Herman Rubin wrote: (in response to bob hayden's 
note)


>As for outliers, the appropriate meaning for them is that they
>are observations which are incorrect, or for which the assumptions
>of the model are invalid.  Those should be removed, as should
>any others of that type.

i think this is either worded incorrectly or ... incorrect ...

outliers might be observations that don't fit the model ... but, that does 
not make the observations incorrect ...
you should only remove the observations IF you have some concurrent 
information that the values indeed ... ARE incorrect data points ... for 
some specific reason(s) ... MIScalculated ... entered wrong ... etc.

just because they don't look nice (according to some model) is not good 
enough

>
>--
>This address is for information only.  I do not claim that these views
>are those of the Statistics Department or of Purdue University.
>Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
>[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558
>
>
>===
>This list is open to everyone.  Occasionally, less thoughtful
>people send inappropriate messages.  Please DO NOT COMPLAIN TO
>THE POSTMASTER about these messages because the postmaster has no
>way of controlling them, and excessive complaints will result in
>termination of the list.
>
>For information about this list, including information about the
>problem of inappropriate messages and information about how to
>unsubscribe, please see the web page at
>http://jse.stat.ncsu.edu/
>===

Dennis Roberts, EdPsy, Penn State University
208 Cedar Bldg., University Park PA 16802
Email: [EMAIL PROTECTED], AC 814-863-2401, FAX 814-863-1002
WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm
FRAMES: http://roberts.ed.psu.edu/users/droberts/drframe.htm



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Square root transformation

2000-05-22 Thread T.S. Lim

In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] says...
>
>We use multiple linear regression to perform our analyses. Because we
>work with binned data (discharge frequency of a neuron) which follow a
>non-normal (Poisson) distribution, we typically use the square root
>transform on the dependent variable (discharge rate of the neuron).
>(Actually, the transformation is sqrt(spike rate + 3/8) )
>
>I've been trying to show that some independent variables account for
>more of the variance explained in the dependent variable. However, some
>researchers in my field argue that the square root transform could
>artificially bias my results so that some independent variable account
>for more of the variance than they really should. I don't see how this
>could be from a theoretical level. Plus, I've run the multiple
>regression without the transform and seen only about a 5% difference
>(not much).
>
>Does anybody know if these criticisms have any theoretical merit? I
>can't see how this can be so. I thought that the square-root transform
>was a pretty sound way of reducing your chance of biasing the analysis
>if the data is non-normal (which most parametric tests require).
>
>Thanks.
>-Tony
>
>
>--
>///
>//   G. Anthony Reina, MD   //
>//   The Neurosciences Institute   //
>//   10640 John Jay Hopkins Drive //
>//   San Diego, CA  92121//
>//   Phone: (858) 626-2132  //
>//   FAX: (858) 626-2199   //
>


You can try a straight Poisson regression. If the conclusions you obtained 
from a Poisson regression are consistent with those from a square-root 
transformation, you'd be OK. The main purpose of a square-root transform in 
your case is to stabilize the variance of the error terms.

-- 
T.S. Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com
__
Get paid to write a review! http://recursive-partitioning.epinions.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: sas vs s-plus for qc

2000-05-22 Thread T.S. Lim

In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] says...
>
>Check out Minitab Release 13. This is the software used by most of the 
Six
>Sigma Black Belt companies. It has very strong DOE, SPC, Process 
Capability,
>and Measurement System Analysis tools. Also, make sure you take a look at
>their help tools (the manuals, on-line help, real-time tutorials, and 
their
>new statguide) - it is without a doubt best in class.


I wouldn't put Minitab in the same class as SAS and S-Plus. Minitab 
belongs to a class below that for SAS, S-Plus, SPSS. IMO, Minitab is 
still good only for teaching purposes. Professional data analysts don't 
use Minitab.


>Also, don't underestimate the fact that Minitab Inc. has essentially one
>product: Minitab. Their support isn't watered down by a  myriad of 
modules
>and other software (such as SPSS and SAS).
>
>You can download a full working copy (limited to 30 days of use) at
>http://www.minitab.com
>
>(I don't work for Minitab or have any connection with them except for 
being
>an extremely satisfied customer)

>Patrick Lee wrote:
>
>> Dear fellow newsgroupers;
>> I am trying to find suitable software for  quality control analysis 
that
>> my manager is about to conduct. I had not used SAS/QC software but have
>> used S-Plus for graphics and find
>> that S-Plus is quicker for graphics. I understand that S-Plus has a DOX
>> module and was
>> wondering if anyone had experiences, good or bad, with this software. I
>> was also
>> wondering if there are any good specialized software for QC or DOX
>> analysis other than
>> SAS or S-Plus. Thanks in advance.
>>
>> Patrick Lee

-- 
T.S. Lim
[EMAIL PROTECTED]
www.Recursive-Partitioning.com
__
Get paid to write a review! http://recursive-partitioning.epinions.com



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Distribution Free Tolerance Limits

2000-05-22 Thread Rich Ulrich

The Subject was written as "Distribution Free Tolerance Limits."

 - here was the statement, 
"We're doing some research in statistical classification of
abnormalities in retinal images (that is, pattern recognization), and
we need to estimate the size of the sample nescesary. We've heard of
some tables for this purpose  and would like to know if someone
knows where to find them (or something similar)."

Tolerance limits?  Are limits wanted in order to reject a bunch of
possible outliers? or  to select the outliers?  

And you are asking for the "size of the sample necessary" -- in order
to achieve what end?   I guess I also suspect that I want to frame the
problem as something other than what I consider "tolerance limits",
and I don't know to what extent the question ought to be serious about
"distribution free."

On 19 May 2000 00:02:52 -0700, [EMAIL PROTECTED] wrote:
> Chebycheff's Inequality redivivus!
> 
 - Right -  that is an ultimate distribution-free result, if I
remember correctly, which only requires finite variance.  (Not
necessarity unimodal.)(And in practice, you ought to do a whole lot
better, right? ...)  I don't know what those texts may be giving -


> Tables of the tolerance factors may be found in the following two 
> venerable texts. (They can also be calculated from the inequality with a 
> number of numerical analysis packages for the Mac or the PC)
> 
> Engineering Statistics, 2nd Edition; Bowker and Lieberman, Prentice Hall
> Introduction to Statistical Analysis, 4th Edition, Dixon and Massey, 
> McGraw Hill

 < snip >

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Square root transformation

2000-05-22 Thread G. Anthony Reina

We use multiple linear regression to perform our analyses. Because we
work with binned data (discharge frequency of a neuron) which follow a
non-normal (Poisson) distribution, we typically use the square root
transform on the dependent variable (discharge rate of the neuron).
(Actually, the transformation is sqrt(spike rate + 3/8) )

I've been trying to show that some independent variables account for
more of the variance explained in the dependent variable. However, some
researchers in my field argue that the square root transform could
artificially bias my results so that some independent variable account
for more of the variance than they really should. I don't see how this
could be from a theoretical level. Plus, I've run the multiple
regression without the transform and seen only about a 5% difference
(not much).

Does anybody know if these criticisms have any theoretical merit? I
can't see how this can be so. I thought that the square-root transform
was a pretty sound way of reducing your chance of biasing the analysis
if the data is non-normal (which most parametric tests require).

Thanks.
-Tony


--
///
//   G. Anthony Reina, MD   //
//   The Neurosciences Institute   //
//   10640 John Jay Hopkins Drive //
//   San Diego, CA  92121//
//   Phone: (858) 626-2132  //
//   FAX: (858) 626-2199   //






===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: non normal multivariate outlier detection

2000-05-22 Thread Herman Rubin

In article <8gal3d$a0e$[EMAIL PROTECTED]>,
Manuel Castejon Limas <[EMAIL PROTECTED]> wrote:
>Dear people,

>Im looking for outlier detection methods in non normal multivariate
>distributions.

>Any help would be appreciated.


The idea of an outlier depends heavily on the distribution;
there is no such thing as an absolute outlier.

The purpose of an detecting an outlier is that it is an
incorrect or spurious observation, and including it would
therefore be likely to give incorrect results.  


-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Least squares Was: Re: what is s.d.?

2000-05-22 Thread Herman Rubin

In article <[EMAIL PROTECTED]>,
Bob Hayden <[EMAIL PROTECTED]> wrote:



>Least squares methods are in some sense optimal when the "errors"
>estimated by the residuals are normally distributed.  They are
>questionable when the errors are multimodal, strongly skewed, or
>afflicted with outliers.

Least squares is not optimal without such conditions.  It is 
valid under much weaker assumptions; the Gauss-Markov theorem
does not care if the errors are multimodal or strongly skewed.
In such cases, so-called robust procedures like least absolute
value are likely to be invalid.  If the dependent variable is
linear in the "independent variables" (not necessarily functionally
independent) of the model linear in the parameters, and the errors
are uncorrelated with the independent variables, least squares
is valid; with more assumptions, one might do better.

As for outliers, the appropriate meaning for them is that they
are observations which are incorrect, or for which the assumptions
of the model are invalid.  Those should be removed, as should
any others of that type.  
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Ann: Fortran2000.com -- All About Fortran

2000-05-22 Thread Jerry Dallal

I wrote:
 
> I tried, but there was "no response from server".

It's working for me now.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: obsolete methods?

2000-05-22 Thread dennis roberts

At 09:14 PM 5/22/00 +0800, Stephen Humphry wrote:

>It doesn't offer guidance up front exactly, no, but it provides feedback on
>whether items work, and an important (imv) conceptual framework for test
>construction.  For example, if you have the Rasch model in mind, you look to
>developing items of a range of difficutly (or 'affective intensity').  You
>wouldn't necessarily think to do this if you were only using other
>techniques, yet it is surely important.  Take the extreme example of a test
>in which every item is of the same affective intensity -- say for
>'satisfaction with your bank'.  Everyone who is higher than a certain
>satisfaction level would be expected to agree with all items, whereas
>everyone below a certain satisfaction would be expected to disagree (of
>course, this probably won't happen but it in reality you may get something
>approaching this situation).


first, items don't have intensity ... people do in response TO an item ...
second, just because (to use an analogy) someone scores (say on a 30 item 
test) high on the test does not mean that they got all items right nor, 
would we expect them to ... so, just because someone has a fairly strong + 
feeling towards a bank does not mean that they agree with (nor would we 
expect them to) all the practices of the bank ... scale scores (not even 
from a rasch developed scale) are not a true guttman scale

certainly though ... one does not need the rasch model to detect these 
tendencies ...

Dennis Roberts, EdPsy, Penn State University
208 Cedar Bldg., University Park PA 16802
Email: [EMAIL PROTECTED], AC 814-863-2401, FAX 814-863-1002
WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm
FRAMES: http://roberts.ed.psu.edu/users/droberts/drframe.htm



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



what is s.d.?

2000-05-22 Thread Bob Hayden

The standard deviation of a single batch of numbers is a typical value
for the residuals (deviations from the mean).  If you divide by n, it
is the RMS mean of the residuals.  You can check your calculation of
the s.d. by comparing it to the residuals.  The mean is the measure of
center that minimizes the sum of the squared residuals, so the s.d. is
the measure of variability that goes with the mean in particular, and
with least squares in general.  For simple linear regression, s is a
typical value for the residuals (deviations from the regression line).
For multiple regression, s is a typical value for the residuals
(deviations from the model).  There's a pattern here!-)

Least squares methods are in some sense optimal when the "errors"
estimated by the residuals are normally distributed.  They are
questionable when the errors are multimodal, strongly skewed, or
afflicted with outliers.
 

  _
 | |  Robert W. Hayden
 | |  Department of Mathematics
/  |  Plymouth State College MSC#29
   |   |  Plymouth, New Hampshire 03264  USA
   | * |  82 River Street
  /|  Ashland, NH 03217-9702
 | )  (603) 968-9914 (home)
 L_/  fax (603) 535-2943 (work)
  [EMAIL PROTECTED]
  http://mathpc04.plymouth.edu


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Cumulative Frequency Polygons a right way?

2000-05-22 Thread Jineshwar Singh

Steve,
 Your interpretation is right because the coordinates of the ogive (graph of
cumulative frequency/ relative cumulative frequency))  indicate " less than
the upper limit".
Jin

Jineshwar Singh, Coordinator, IDS
Interdisciplinary Department
George Brown College
St .James campus
[EMAIL PROTECTED]
*
You cannot control how others act but you can
control how you react.
416 -415-2089
http://www.gbrownc.on.ca/~jsingh

- Original Message -
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, May 22, 2000 2:08 AM
Subject: Cumulative Frequency Polygons a right way?


> Hi all,
>
> First up, the purpose I have at hand is to make interpolations for
> percentages of students who have achieved above a certain score on a
> test (where this score may lie between two discrete score points on the
> scale).
>
> It seems to me cumulative frequencies should be plotted at the exact
> upper limit of each interval.  This is the only simple method that
> makes sense to me.
>
> However, it has been suggested by others in the context I'm dealing
> with that frequencies/percentages can alternatively be plotted at the
> mid-point of each interval, or even at the lower limit!  Although I can
> understand plotting graphs at the mid-point for ease of representation,
> this hardly seems suited to making interpolations.  This is because
> when you read off the graph at the upper limit of a given interval, you
> will (probably) have more cases than fell up to and including the
> interval itself.  This is surely absurd, yet people seem to seriously
> believe it is a viable alternative.
>
> I'm really hoping for a good reference on this (preferably by a highly
> regarded author to make the case stronger :).  Any comments, or refs?
>
> Thanks!
>
> Steve.
>
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.
>
>
>
===
> This list is open to everyone.  Occasionally, less thoughtful
> people send inappropriate messages.  Please DO NOT COMPLAIN TO
> THE POSTMASTER about these messages because the postmaster has no
> way of controlling them, and excessive complaints will result in
> termination of the list.
>
> For information about this list, including information about the
> problem of inappropriate messages and information about how to
> unsubscribe, please see the web page at
> http://jse.stat.ncsu.edu/
>
===
>



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: sas vs s-plus for qc

2000-05-22 Thread Ken K.

Check out Minitab Release 13. This is the software used by most of the Six
Sigma Black Belt companies. It has very strong DOE, SPC, Process Capability,
and Measurement System Analysis tools. Also, make sure you take a look at
their help tools (the manuals, on-line help, real-time tutorials, and their
new statguide) - it is without a doubt best in class.

Also, don't underestimate the fact that Minitab Inc. has essentially one
product: Minitab. Their support isn't watered down by a  myriad of modules
and other software (such as SPSS and SAS).

You can download a full working copy (limited to 30 days of use) at
http://www.minitab.com

(I don't work for Minitab or have any connection with them except for being
an extremely satisfied customer)

Patrick Lee wrote:

> Dear fellow newsgroupers;
> I am trying to find suitable software for  quality control analysis that
> my manager is about to conduct. I had not used SAS/QC software but have
> used S-Plus for graphics and find
> that S-Plus is quicker for graphics. I understand that S-Plus has a DOX
> module and was
> wondering if anyone had experiences, good or bad, with this software. I
> was also
> wondering if there are any good specialized software for QC or DOX
> analysis other than
> SAS or S-Plus. Thanks in advance.
>
> Patrick Lee



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: obsolete methods?

2000-05-22 Thread Stephen Humphry



> This is all fine, but please remember that Rasch is essentially a
> sophisticated (and much more thoughtful) mathematical model for
> describing the properties of items and people;  it offers no guidance on
> how to write items for an attitude measurement scale.  One still has to
> define constructs, write items and design an appropriate response mode.

It doesn't offer guidance up front exactly, no, but it provides feedback on
whether items work, and an important (imv) conceptual framework for test
construction.  For example, if you have the Rasch model in mind, you look to
developing items of a range of difficutly (or 'affective intensity').  You
wouldn't necessarily think to do this if you were only using other
techniques, yet it is surely important.  Take the extreme example of a test
in which every item is of the same affective intensity -- say for
'satisfaction with your bank'.  Everyone who is higher than a certain
satisfaction level would be expected to agree with all items, whereas
everyone below a certain satisfaction would be expected to disagree (of
course, this probably won't happen but it in reality you may get something
approaching this situation).  In this case, the instrument will not
effectively discriminate between a person somewhat lower on satisfaction than
all your items are targeted toward (eg lower than the level of satisfaction
needed to just agree with a certain statement), versus someone far less
satisfied than that again (both will simply disagree with all or most
statements).  Conversely, if you have a set of items which target a range of
satisfaction levels, you expect different scores for most people on the test,
dependent upon their particular level of satisfaction (roughly in keeping
with a Guttman structure).  Surely that's what you should be after!  It also
tells you whether categories on Likert scales function well or not.  For
example, 'neutral' categories don't typically work very well (mind you, I
can't give empirical evidence, this is just what people have found in
experience, including myself).  Looking at which items fit and which don't
obviously provides critical information about the nature of the construct
itself.  Via feedback from these sorts of things, you certainly get an idea
of what kinds of items and response modes effectively elicit responses
indicative of a latent trait.  That is, responses governed stochastically by
item 'difficulty' and person 'ability' (or affective intensity and
satisfaction).

> Rasch provides a mathematical rationale for selecting items for
> inclusion in a scale, using the criterion of "fit to the model".  I
> don't claim great expertise here, but when I ran an attitude scale
> through a Rasch analysis and a traditional item analysis/factor analysis
> (many years ago), the decisions reached about which items to include or
> exclude were not too different.
>

Sure, this may be the case.  Not necessarily though.  I have tried the same
on a couple of occasions and found that the decisions were quite different
based on Rasch analysis vs Factor Analysis.  This would in fact have been
expected given the Rasch analysis because the Likert categories did not
effectively discriminate with respect to the latent trait.  Correlational
techniques obviously rely upon having roughly equal intervals between score
points.

>
> I regard Rasch as a synthesis of the Thurstone and Likert techniques.
> Thurstone placed much emphasis on item calibration, getting large
> numbers of judges to rate where items were located on a supposedly
> interval scale, but used only a small number of items to measure
> individuals' attitudes.  Likert placed much emphasis on person
> measurement, using a large number of items to measure people's
> attitudes, but placed less emphasis on the calibration of item
> properties.  Rasch places equal emphasis on person measurement and item
> calibration, and uses a common measurement scale for both.  However,
> bear in mind that all are psychometric methods which attempt to measure
> attitudes by producing a scale score.  I took the original question that
> started off this thread to wonder about psychometric methods were
> obsolete, and not whether Likert and Thurstone had been replaced by
> better mathematical models.
>
> Paul Gardner

Rasch measurement is in essence equivalent to Thurstone's law of comparative
judgement except that (a) the person parameter is substituted for one of the
item parameters, and (b) the logistic function is substituted for the
normal.  It is based on the same logic but the above trick allows separation
of person and item parameters.  Yes, all are methods with attempt to measure
attitudes (or whatever) by producing a score.  However, Rasch uses a
non-linear 'transformation' of the raw score so is fundamentally different
from Likert's approach (not Thurstone's of course, in that respect).  On your
last statment above, many would argue that these psychometric methods are
obsolete precisely becau

Re: Cumulative Frequency Polygons a right way?

2000-05-22 Thread dennis roberts

a cumulative frequency is up to SOME point ... the problem is, WHAT is the 
point
does it include THE point? i don't really see much (if any) difference 
between (say we have a score scale that goes up to 50 and, 1 point is given 
for each valid response) saying we have accumulated 53% to a score of 38 
... or, the upper limit of 38.5 ... or for that matter, anywhere between 38 
and (but not quite) 39 ... people can't get scores of decimal values anyway

an upper limit, by definition, is always a value that can't be achieved

At 06:08 AM 5/22/00 +, [EMAIL PROTECTED] wrote:
>Hi all,
>
>First up, the purpose I have at hand is to make interpolations for
>percentages of students who have achieved above a certain score on a
>test (where this score may lie between two discrete score points on the
>scale).
>
>It seems to me cumulative frequencies should be plotted at the exact
>upper limit of each interval.  This is the only simple method that
>makes sense to me.
>
>However, it has been suggested by others in the context I'm dealing
>with that frequencies/percentages can alternatively be plotted at the
>mid-point of each interval, or even at the lower limit!  Although I can
>understand plotting graphs at the mid-point for ease of representation,
>this hardly seems suited to making interpolations.  This is because
>when you read off the graph at the upper limit of a given interval, you
>will (probably) have more cases than fell up to and including the
>interval itself.  This is surely absurd, yet people seem to seriously
>believe it is a viable alternative.
>
>I'm really hoping for a good reference on this (preferably by a highly
>regarded author to make the case stronger :).  Any comments, or refs?
>
>Thanks!
>
>Steve.
>
>
>Sent via Deja.com http://www.deja.com/
>Before you buy.
>
>
>===
>This list is open to everyone.  Occasionally, less thoughtful
>people send inappropriate messages.  Please DO NOT COMPLAIN TO
>THE POSTMASTER about these messages because the postmaster has no
>way of controlling them, and excessive complaints will result in
>termination of the list.
>
>For information about this list, including information about the
>problem of inappropriate messages and information about how to
>unsubscribe, please see the web page at
>http://jse.stat.ncsu.edu/
>===

Dennis Roberts, EdPsy, Penn State University
208 Cedar Bldg., University Park PA 16802
Email: [EMAIL PROTECTED], AC 814-863-2401, FAX 814-863-1002
WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm
FRAMES: http://roberts.ed.psu.edu/users/droberts/drframe.htm



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: non normal multivariate outlier detection

2000-05-22 Thread Richard M. Barton

Hello Manuel,

I think a good place to start is Barnett, V and Lewis, T,  Outliers in statistical 
data.

rick


--- "Manuel Castejon Limas" wrote:
Dear people,

I¥m looking for outlier detection methods in non normal multivariate
distributions.

Any help would be appreciated.





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===
--- end of quote ---


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Cumulative Frequency Polygons a right way?

2000-05-22 Thread Donald F. Burrill

On Mon, 22 May 2000 [EMAIL PROTECTED] wrote:

> First up, the purpose I have at hand is to make interpolations for
> percentages of students who have achieved above a certain score on a
> test (where this score may lie between two discrete score points on the 
> scale).

One might inquire, if one were pursuing this matter in a little more 
depth, why one would not prefer a continuous approximating distribution 
(e.g., normal, if that be appropriate, as is often the case), on the 
basis either that the empirical CFs at hand represent an instance drawn 
from such an idealized population, or that the continous function is an 
adequate approximation to the true population distribution;  since the 
purpose you describe clearly is to apply the CF information to some 
(hypothetical?) set of students whose scores are not in fact represented 
in the data in hand.  (Of course, the problem you describe below still 
arises, in terms of how one converts from the discrete empirical CF 
function to the (idealized?) continuous function;  this is much less a 
problem if the continuous function is obtained from information other 
than the CFs themselves -- e.g., an approximating normal distribution 
would be derived from the empirical mean and standard deviation, not from 
the empirical CFs.)

> It seems to me cumulative frequencies should be plotted at the exact
> upper limit of each interval.  This is the only simple method that
> makes sense to me.

If by "cumulative frequency" ("CF" above) you mean "observed frequency of 
responses less than or equal to this score value", and especially if 
these CFs have been cumulated over a grouped empirical frequency 
distribution, your logic is impeccable.  If you've been cumulating at the 
level of individual score values, there may be room for SOME quibbling.

> However, it has been suggested by others in the context I’m dealing
> with that frequencies/percentages can alternatively be plotted at the
> mid-point of each interval, or even at the lower limit!  Although I can 
> understand plotting graphs at the mid-point for ease of representation,
> this hardly seems suited to making interpolations.  This is because
> when you read off the graph at the upper limit of a given interval, you 
> will (probably) have more cases than fell up to and including the
> interval itself.  This is surely absurd, yet people seem to seriously
> believe it is a viable alternative.

First, make sure you're all on the same wavelength.  You clearly are 
thinking in terms of "<=" CFs;  plotting at the lower limit would be 
appropriate for "strictly <" CFs (or equivalently ">=" CFs).  Plotting at 
the midpoint would be reasonable if one took for one's CF the midpoint 
between a "strictly <" CF and a "<=" CF.  If upon examination it turns 
out that your colleagues (?) really think they're dealing with "<<=" CFs: 

You might ask them how they view the two intervals at the extreme 
ends of the CFs.  In terms of relative cumulative percents (C%s), what 
scores then apply to the upper and lower limits of (1) the lowest 
non-empty score interval;  (2) the highest score interval?  And in 
particular, what C% applies to the upper limit of the highest interval? 
Either of the two alternatives you report implies a C% > 100% here, which 
ought to be absurd enough for anyone with a decent grasp of reality.

Another approach is to inquire how one would arrange a CF 
downward -- i.e., where the C%s range from 0 at the maximum value to 
100% at the minimum, and the CFs represent the frequency of responses 
greater than or equal to this score value.  

> I’m really hoping for a good reference on this (preferably by a highly
> regarded author to make the case stronger :).  Any comments, or refs?

Sorry, can't help you here, I don't think.  It has not been my 
habit to invoke appeals to the Irrelevant Authorities at Headquarters, 
nor am I much impressed by such appeals.  If the authorities invoked are 
in fact relevant, they have logical arguments on their side, and the 
logical arguments are what one needs, not the name(s) of the authorities. 

Of course, if you're dealing with folks who DON'T have a decent 
grasp of reality, irrelevant authorities may be a surprisingly effective 
part of one's armamentarium.  In this case, look for any standard 
introductory statistics texts that deal in detail with CFs, which 
probably means texts three decades old or more (your local university 
library should have an adequate assortment), and pick one whose author(s) 
happen to be well-known in the field in which these folks think they 
operate.  (But make sure the authors' logic is correct!)
-- DFB.
 
 Donald F. Burrill [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,  [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264  

Re: What is standard deviation exactly?

2000-05-22 Thread Duncan Murdoch

On Mon, 22 May 2000 13:24:25 +1000, "Glen Barnett"
<[EMAIL PROTECTED]> wrote:

>I assume you're talking about sample standard deviations,
>not population standard deviations (though interpretation
>of what it represents is similar).
>
> ...
>
>Note that the standard deviation can't exceed half the range
>(largest value minus smallest value).

That's true for the n denominator ("population standard deviation"),
but not for n-1 ("sample standard deviation").  For example, if your
sample is just the two points 0 and 1, the sample standard deviation
is 0.71, and the range is 1.

Duncan Murdoch


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Signal detection: signal, noise and a 2nd signal?

2000-05-22 Thread Manni Heumann


Sounds just perfect. Thanks for taking the time!



[EMAIL PROTECTED] wrote:
>The question of discriminating among three or more events has been
>successfully tackled  by Brian Scurfield. He extended typical
>two-event ROC analysis to n-event ROC analysis (n>2), where results
>are expressed as n-dimentional ROC hypersurfaces, and sensitivity can
>be understood in terms of hypervolumes under the hypersurfaces. He
>also developed a new type of distribution-free sensitivity measure
>based on an information theory analysis of n-event discrimination
>tasks. The measure gives an overall measure of detectability among n
>events, and also allows sensible comparisons to be made between
>n-event tasks and (n-1)-event tasks, say.
>
>Scurfield illustrated his findings using the 3-event case, so if
>you're specifically interested in that case, check out his papers:
>
>Scurfield, B.K. (1996) "Multiple-event forced-choice tasks in the
>theory of signal detectability", Journal of Mathematical Psychology,
>40(3), 253-269
>
>Scurfield, B.K. (1998) "Generalization of the theory of signal
>detectability to m-dimensional n-event forced-choice tasks", Journal
>of Mathematical Psychology, 42(1), 5-31.
>
>The JMP abstracts used to be available online, but I don't know if
>they still are.
>
>Also, there was an independent development of some of this material by
>Douglas Mossman. He had a paper in Medical Decision Making in either
>1998 or 1999 entitled "Three-way ROCs". Sorry, can't remember the
>volume.
>
>
>Hope this helps,
>
>Vit D.
>
>

--

Manni


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Cumulative Frequency Polygons a right way?

2000-05-22 Thread steve_humphry



Hi all,

First up, the purpose I have at hand is to make interpolations for
percentages of students who have achieved above a certain score on a
test (where this score may lie between two discrete score points on the
scale).

It seems to me cumulative frequencies should be plotted at the exact
upper limit of each interval.  This is the only simple method that
makes sense to me.

However, it has been suggested by others in the context I’m dealing
with that frequencies/percentages can alternatively be plotted at the
mid-point of each interval, or even at the lower limit!  Although I can
understand plotting graphs at the mid-point for ease of representation,
this hardly seems suited to making interpolations.  This is because
when you read off the graph at the upper limit of a given interval, you
will (probably) have more cases than fell up to and including the
interval itself.  This is surely absurd, yet people seem to seriously
believe it is a viable alternative.

I’m really hoping for a good reference on this (preferably by a highly
regarded author to make the case stronger :).  Any comments?  Any nice
references?

Thanks!

Steve.


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: obsolete methods?

2000-05-22 Thread Paul Gardner

[EMAIL PROTECTED] wrote:
> 
> In a way, yes, they were superseded.  There is a school of thought now
> in which the proponents would argue (imv successfully) that the
> approach you’ve outlined culminated in Rasch’s Simple Logistic Model.
> Over and above the benefits of the Thurstone’s comparative judgements,
> the Rasch model allows you to place person parameters and item
> parameters on the one metric, and to eliminate person parameters in the
> estimation of item parameters, and vice versa.  Most importantly,
> according to proponents, the model allows you to achieve the
> requirements of fundamental measurement (including conjoint additivity
> and invariance --I can give some explanation and/or quotes if you like)
> provided a reasonable fit of data to the model.
> 
> The Rasch model is a stochastic one in which responses are said to be
> governed only by a person parameter and an item parameter.  While
> Thurstone used the normal curve in his law of comparative judgement,
> Rasch used the logistic approximation (very close of course).  Tthis
> allowed him to separate person and item parameters – a very significant
> achievement imv.  In turn, he called the outcome ‘specific
> objectivity’.  That is, the estimates of item parameters are
> independent of the particular set of persons used to derive them, and
> the estimates of person parameters are independent of the particular
> set of items used to derive them (this is algebraic fact under the
> model, then the point is – do the data fit the model?).  Such
> objectivity is key in the physical sciences (I have a quote in which
> Andrich (see below) shows how this situation applies for a = f/m,
> whereby a comparison of accelerations is independent of the force that
> is instrumental in causing them).
> 
> There are various sources of information on this.  Try www.rasch.org/
> for some discussion of the properties of the Rasch model, applications,
> and various other things.  Or there is “Rasch Models for Measurement”
> by David Andrich.  But there are various refs on the above website.  If
> you want to know anything more, just ask, and I’ll help if I can –
> though I’m relatively new to Rasch myself (I’ve done courses with David
> Andrich, who developed the ‘Extended Logistic Model’ for use with
> Likert scale data rather than dichotomous data, and himself trained
> with Rasch for a time).
> 
This is all fine, but please remember that Rasch is essentially a
sophisticated (and much more thoughtful) mathematical model for
describing the properties of items and people;  it offers no guidance on
how to write items for an attitude measurement scale.  One still has to
define constructs, write items and design an appropriate response mode. 
Rasch provides a mathematical rationale for selecting items for
inclusion in a scale, using the criterion of "fit to the model".  I
don't claim great expertise here, but when I ran an attitude scale
through a Rasch analysis and a traditional item analysis/factor analysis
(many years ago), the decisions reached about which items to include or
exclude were not too different.

I regard Rasch as a synthesis of the Thurstone and Likert techniques. 
Thurstone placed much emphasis on item calibration, getting large
numbers of judges to rate where items were located on a supposedly
interval scale, but used only a small number of items to measure
individuals' attitudes.  Likert placed much emphasis on person
measurement, using a large number of items to measure people's
attitudes, but placed less emphasis on the calibration of item
properties.  Rasch places equal emphasis on person measurement and item
calibration, and uses a common measurement scale for both.  However,
bear in mind that all are psychometric methods which attempt to measure
attitudes by producing a scale score.  I took the original question that
started off this thread to wonder about psychometric methods were
obsolete, and not whether Likert and Thurstone had been replaced by
better mathematical models.

Paul Gardner


begin:vcard 
n:Gardner;Dr Paul
tel;cell:0412 275 623
tel;fax:Int + 61 3 9905 2779 (Faculty office)
tel;home:Int + 61 3 9578 4724
tel;work:Int + 61 3 9905 2854
x-mozilla-html:FALSE
adr:;;
version:2.1
email;internet:[EMAIL PROTECTED]
x-mozilla-cpt:;-29488
fn:Dr Paul Gardner, Reader in Education and Director, Research Degrees, Faculty of Education, Monash University,  Vic. Australia 3800
end:vcard



Re: What is standard deviation exactly?

2000-05-22 Thread Paul Gardner

Glen Barnett wrote:
> 
>  In article <[EMAIL PROTECTED]>,
>  Neil  <[EMAIL PROTECTED]> wrote:
>  >I was wondering what the standard deviation means exactly?
> >
> >I've seen the equation, etc., but I don't really understand
> >what st dev is and what it is for.
> 
> I'm going to take a different tack to that Herman has taken.
> If I tell you what you already know, my apologies.
> 
> I assume you're talking about sample standard deviations,
> not population standard deviations (though interpretation
> of what it represents is similar).
> 
> Standard deviation is an attempt to measure how "spread out"
> the values are - big standard deviation means more spread out,
> small standard deviation means closer together. A standard
> deviation of zero means all the values are the same.
> 
> Note that the standard deviation can't exceed half the range
> (largest value minus smallest value).
> 
> Standard deviation is measured in the original units. For example,
> if you record a set of lengths in mm, their standard deviation is in mm.
> 
> There is a huge variety of reasonable measures of spread.
> Standard deviation is the most used. You will get more of
> a feel for the standard deviation if you compare what it
> does to some other measures of spread.
> 
> For example, another common measure is the mean deviation -
> the average distance of observations from the mean. By contrast,
> standard deviation is the root-mean-square distance from the mean
> (as you can see from the formula**).
> 
> ** At least the n-denominator (maximum likelihood) version is the
> root-mean-square deviation; the n-1 denominator is just a constant
> times that.
> 
> This squaring puts relatively more weight on the larger deviations,
> and less weight on the smaller deviations than the mean deviation,
> but it is still a kind of weighted average of the deviations from the
> mean.
> 
> Here's a quick (tiny) example to help illustrate some of the points
> (I am using the n-1 version of the standard deviation here):
> 
> Sample 1: 4, 6, 7, 7, 8, 10
> Mean = 7, mean deviation = 4/3 = 1.333..., std deviation=2
> 
> Sample 2:   1, 5, 7, 7, 9, 13
> Mean = 7, mean deviation = 8/3 = 2.666..., std deviation =4
> 
> Note that Sample 2's values are more 'spread out' than sample 1's,
> and both of the measures of spread tell us that.
> 
> Standard deviation is used for a variety of reasons - including the
> fact that it is the square root of the variance, and variance has
> some nice properties, both in general and also particularly for
> normal r.v.'s, but s.d. is measured in original units.
> 
> Glen
> 
This is a useful summary: I'd just like to add one point to it.  People
sometimes ask, which measure of spread is "best"?  Or, why use standard
deviation, it seems more complicated than simpler statistics such as
mean average deviation.  Various measures of spread are useful for
different purposes, but the real strength of s.d. is that many other
statistical concepts are built upon it.  Thus s.d. underpins the notion
of a standard (z) score, z score underpins the definition of Pearson
product-moment correlation, and hence linear regression; s.d. squared is
variance, and this underpins the variance theorem, analysis of variance,
F-ratio etc. etc.  Thus it's a "big idea", a substantive concept in the
structure of statistics, in a way that other measures of spread aren't.

There are parallels to this in other branches of science and
mathematics.  Mass times velocity (momentum) is a useful concept,
because it enters into relationships with other concepts.  So does
(1/2)m v-squared (kinetic energy).  But no one uses mass per unit
velocity, or mass times the square root of velocity, or m v-cubed,
because (as far as I know) these concepts don't enter into any
relationships which are useful for describing aspects of the world.

Paul Gardner


begin:vcard 
n:Gardner;Dr Paul
tel;cell:0412 275 623
tel;fax:Int + 61 3 9905 2779 (Faculty office)
tel;home:Int + 61 3 9578 4724
tel;work:Int + 61 3 9905 2854
x-mozilla-html:FALSE
adr:;;
version:2.1
email;internet:[EMAIL PROTECTED]
x-mozilla-cpt:;-29488
fn:Dr Paul Gardner, Reader in Education and Director, Research Degrees, Faculty of Education, Monash University,  Vic. Australia 3800
end:vcard



non balanced MANOVA

2000-05-22 Thread Manuel Castejon Limas

Dear people,

I am interested in knowing about the different ways
people manage the bad behaviour of MANOVA as soon as the algorithm is used
with irregular number of samples in each class.

Any help would be appreciated.





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



non normal multivariate outlier detection

2000-05-22 Thread Manuel Castejon Limas

Dear people,

I´m looking for outlier detection methods in non normal multivariate
distributions.

Any help would be appreciated.





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: obsolete methods?

2000-05-22 Thread steve_humphry

In a way, yes, they were superseded.  There is a school of thought now
in which the proponents would argue (imv successfully) that the
approach you’ve outlined culminated in Rasch’s Simple Logistic Model.
Over and above the benefits of the Thurstone’s comparative judgements,
the Rasch model allows you to place person parameters and item
parameters on the one metric, and to eliminate person parameters in the
estimation of item parameters, and vice versa.  Most importantly,
according to proponents, the model allows you to achieve the
requirements of fundamental measurement (including conjoint additivity
and invariance --I can give some explanation and/or quotes if you like)
provided a reasonable fit of data to the model.

The Rasch model is a stochastic one in which responses are said to be
governed only by a person parameter and an item parameter.  While
Thurstone used the normal curve in his law of comparative judgement,
Rasch used the logistic approximation (very close of course).  Tthis
allowed him to separate person and item parameters – a very significant
achievement imv.  In turn, he called the outcome ‘specific
objectivity’.  That is, the estimates of item parameters are
independent of the particular set of persons used to derive them, and
the estimates of person parameters are independent of the particular
set of items used to derive them (this is algebraic fact under the
model, then the point is – do the data fit the model?).  Such
objectivity is key in the physical sciences (I have a quote in which
Andrich (see below) shows how this situation applies for a = f/m,
whereby a comparison of accelerations is independent of the force that
is instrumental in causing them).

There are various sources of information on this.  Try www.rasch.org/
for some discussion of the properties of the Rasch model, applications,
and various other things.  Or there is “Rasch Models for Measurement”
by David Andrich.  But there are various refs on the above website.  If
you want to know anything more, just ask, and I’ll help if I can –
though I’m relatively new to Rasch myself (I’ve done courses with David
Andrich, who developed the ‘Extended Logistic Model’ for use with
Likert scale data rather than dichotomous data, and himself trained
with Rasch for a time).

Take care,

Steve


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===