Re: Cumulative Frequency Polygons a right way?
> Steve, > Your interpretation is right because the coordinates of the ogive (graph of > cumulative frequency/ relative cumulative frequency)) indicate " less than > the upper limit". > Jin Thanks kindly Jin, that's what I think has to be the case. Steve. Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Cumulative Frequency Polygons a right way?
Steve) Thanks very much for your response. One might inquire, if one were pursuing this matter in a little more depth, why one would not prefer a continuous approximating distribution (e.g., normal, if that be appropriate, as is often the case), on the basis either that the empirical CFs at hand represent an instance drawn from such an idealized population, or that the continous function is an adequate approximation to the true population distribution; since the purpose you describe clearly is to apply the CF information to some (hypothetical?) set of students whose scores are not in fact represented in the data in hand. Steve) Yeah, a normal approximation might be a good idea. Our data are typically close to being normally distributed, though since Rasch measurement is used no assumptions of normality are made (and given this, I dont know how the suggestion would go down, but still .). Is it about (hypothetical) students not represented in the data? Well, yes and no. Not literally, but in essence, yes we want to make an interpolation so as to more closely approximate how many students 'might have actually' scored below a relatively more precise score point than our test provides for. For example, we may have 210 students with an ability (logit) of 1.32, 330 students at 0.81, and wish to closer approximate how many students would hypothetically score below a value of, say, 1.01. See, the percentages are reported publicly from year to year, and large fluctuations may cause a stir! We're in essence trying to anticipate what may happen with a different test but the same 'cut score' on the same scale, in future years (assuming the ability distribution stays fairly stable). Of course, the only proper way to do this is to measure more accurately (not an option), but obviously Im after the best way to approximate given what we have. (Of course, the problem you describe below still arises, in terms of how one converts from the discrete empirical CF function to the (idealized?) continuous function; this is much less a problem if the continuous function is obtained from information other than the CFs themselves -- e.g., an approximating normal distribution would be derived from the empirical mean and standard deviation, not from the empirical CFs.) Steve) I can only see us doing this if the normal (or other distribution) is a close approximation at all, or at least most, points along the scale. But thanks, this is well worth exploring. If by "cumulative frequency" ("CF" above) you mean "observed frequency of responses less than or equal to this score value", and especially if these CFs have been cumulated over a grouped empirical frequency distribution, your logic is impeccable. If you've been cumulating at the level of individual score values, there may be room for SOME quibbling. Steve) No, I mean < , though I dont see that it makes a great deal of difference for interpolation given that we may be talking about any point up to a couple of decimal places on a scale of range about 5 to +5. First, make sure you're all on the same wavelength. You clearly are thinking in terms of "<=" CFs; plotting at the lower limit would be appropriate for "strictly <" CFs (or equivalently ">=" CFs). Plotting at the midpoint would be reasonable if one took for one's CF the midpoint between a "strictly <" CF and a "<=" CF. If upon examination it turns out that your colleagues (?) really think they're dealing with "<<=" CFs: Steve) Im not sure I explained in sufficient detail. We want to make interpolations, potentially at any point on the continuum (to a couple of decimals). Nonetheless, this is something that needs to be explicitly clarified, youre right. It hasnt been to date, so far as Im aware (Ive assumed everyone means percentage below the score). You might ask them how they view the two intervals at the extreme ends of the CFs. In terms of relative cumulative percents (C%s), what scores then apply to the upper and lower limits of (1) the lowest non-empty score interval; (2) the highest score interval? And in particular, what C% applies to the upper limit of the highest interval? Either of the two alternatives you report implies a C% > 100% here, which ought to be absurd enough for anyone with a decent grasp of reality. Steve) Thats it! Perfect way to make the point, I think. I did think of that some time ago, but I must admit it has slipped my mind since. A reductio ad absurdum should hit the spot! Thanks again. Another approach is to inquire how one would arrange a CF downward -- i.e., where the C%s range from 0 at the maximum value to 100% at the minimum, and the CFs represent the frequency of responses greater than or equal to this score value. Steve) Yes, Ive raised this. As for references, well the logic is all that concerns me, I can assure you. However, that done, anything else to make the case would be good. I've consulted with a couple of texts already, and they recommen
Re: obsolete methods?
> > first, items don't have intensity ... people do in response TO an item ... An item's intensity is defined in terms of the response it elicits in persons. I'm using Thurstone's terminology. If you're not happy with that shorthand, fair enough! :-) > second, just because (to use an analogy) someone scores (say on a 30 item > test) high on the test does not mean that they got all items right nor, > would we expect them to ... so, just because someone has a fairly strong + > feeling towards a bank does not mean that they agree with (nor would we > expect them to) all the practices of the bank ... Who said anything about 'all practices in the bank'? This is probably not a good way to elicit responses indicative of 'level of satisfaction'. In principle though, someone higher on satisfaction (X) than another (Y) should tend to agree with most or all of the statements that Y agrees with, then some more. If not, you do not have a basis for obtaining measurements (unless you're using an unfolding structure). scale scores (not even > from a rasch developed scale) are not a true guttman scale Steve) It is fairly simple to show that Rasch is a probabilistic Guttman scale -- that is, the patterns of scoring corresponding with the Guttman structure are most probable under the Rasch model, and patterns close to the structure are more probable than ones removed from that structure. > certainly though ... one does not need the rasch model to detect these > tendencies ... I agree. Steve. Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: LISREL and Confirmatory FA
In article <8gcm45$i6e$[EMAIL PROTECTED]>, [EMAIL PROTECTED] says... > >Try the free student version of AMOS for structural equation >modeling >http://www.smallwaters.com/amos/student.html > >AMOS does factor analysis, path analysis >and includes online documentation. There's also the free Mx package that does structural equation modeling. The link is at http://www.kdcentral.com >In article <8fjhn0$8rd$[EMAIL PROTECTED]>, > "Buoy" <[EMAIL PROTECTED]> wrote: >> Hello to all >> >> I'm a Sociology student in Warsaw University finishing my 6th >semester. >> During last year I participated in a course of quantitative >methodology. >> While analyzing survey data from Polish General Social Survey - PGSS >(which >> was conuducted from 1992, 93, 94, 95, 96, 97, 99, further information >in the >> website of Institute for Social Studies: http://andante.iss.uw.edu.pl) >the >> group was learning about various methods of hypothesis testing using >Jacques >> Tacq "Multivariate Analysis in Social Science Resaerch". I noticed the >> substantial lack of literature in the subject on Polish book market. >Despite >> of the small set of SAGE publications in the Institutes library there >are no >> books about methods of data analysis . >> >> Lately I had to perform a confirmatory factor analysis. I downloaded a >free >> version of Joreskog LISREL form the SSI site. Unfortunately there was >no >> posibility of downloading manual for free. The prices were also out of >my >> financial reach. I was wondering if anyone of You could give me any >location >> of free tutorials and texts on confirmatory factor analysis and >structural >> equation modeling which are available on the Internet. I'm also >intersted in >> some brief examples of thos procedures. >> >> Thank you in advance for any help >> >> Michal Bojanowski >> >> > >-- >Eugene D. Gallagher >ECOS, UMASS/Boston -- T.S. Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com __ Get paid to write a review! http://recursive-partitioning.epinions.com === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: LISREL and Confirmatory FA
Try the free student version of AMOS for structural equation modeling http://www.smallwaters.com/amos/student.html AMOS does factor analysis, path analysis and includes online documentation. In article <8fjhn0$8rd$[EMAIL PROTECTED]>, "Buoy" <[EMAIL PROTECTED]> wrote: > Hello to all > > I'm a Sociology student in Warsaw University finishing my 6th semester. > During last year I participated in a course of quantitative methodology. > While analyzing survey data from Polish General Social Survey - PGSS (which > was conuducted from 1992, 93, 94, 95, 96, 97, 99, further information in the > website of Institute for Social Studies: http://andante.iss.uw.edu.pl) the > group was learning about various methods of hypothesis testing using Jacques > Tacq "Multivariate Analysis in Social Science Resaerch". I noticed the > substantial lack of literature in the subject on Polish book market. Despite > of the small set of SAGE publications in the Institutes library there are no > books about methods of data analysis . > > Lately I had to perform a confirmatory factor analysis. I downloaded a free > version of Joreskog LISREL form the SSI site. Unfortunately there was no > posibility of downloading manual for free. The prices were also out of my > financial reach. I was wondering if anyone of You could give me any location > of free tutorials and texts on confirmatory factor analysis and structural > equation modeling which are available on the Internet. I'm also intersted in > some brief examples of thos procedures. > > Thank you in advance for any help > > Michal Bojanowski > > -- Eugene D. Gallagher ECOS, UMASS/Boston Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Square root transformation
In article <[EMAIL PROTECTED]>, G. Anthony Reina <[EMAIL PROTECTED]> wrote: >We use multiple linear regression to perform our analyses. Because we >work with binned data (discharge frequency of a neuron) which follow a >non-normal (Poisson) distribution, we typically use the square root >transform on the dependent variable (discharge rate of the neuron). >(Actually, the transformation is sqrt(spike rate + 3/8) ) Is there enough independence that the counts should be Poisson? If so, the square root transformation does stabilize the variance, but it introduces a bias. In addition, any non-linear transformation destroys the linearity of the model. The most important criterion for a regression or similar procedure is the form of the model; for a linear regression, with any number of independent variables, the linearity is most important. You COULD run a non-linear regression, using the square root of a linear combination of independent variables, or you could use a Poisson model and maximum likelihood, or others. >I've been trying to show that some independent variables account for >more of the variance explained in the dependent variable. However, some >researchers in my field argue that the square root transform could >artificially bias my results so that some independent variable account >for more of the variance than they really should. I don't see how this >could be from a theoretical level. Plus, I've run the multiple >regression without the transform and seen only about a 5% difference >(not much). It certainly can. If one variable is more important at the low end, and another at the high end, this will happen. >Does anybody know if these criticisms have any theoretical merit? I >can't see how this can be so. I thought that the square-root transform >was a pretty sound way of reducing your chance of biasing the analysis >if the data is non-normal (which most parametric tests require). Your tests are only approximate, anyhow. The most important thing is the form of the model; use your theoretical knowledge to decide which ones to use. It usually does not matter how good the tests are if the model is not accurate, and whatever null hypothesis you test is going to be false, anyhow. It is up to you to decide the meaning of the form of the model, without regard to statistical testing. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Cumulative Frequency Polygons a right way?
- Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, May 22, 2000 1:15 AM Subject: Cumulative Frequency Polygons a right way? > > > Hi all, > > First up, the purpose I have at hand is to make interpolations for > percentages of students who have achieved above a certain score on a > test (where this score may lie between two discrete score points on the > scale) _ ETC (see his original) I have always assumed that the Kaplan-Meier estimator is the accepted plotting method. You will find this in one of your stat books or in some text on failure-time analysis. For large bin sets there is very little difference between the three positions you give. DAHeiser > It seems to me cumulative frequencies should be plotted at the exact > upper limit of each interval. This is the only simple method that > makes sense to me. > > However, it has been suggested by others in the context I'm dealing > with that frequencies/percentages can alternatively be plotted at the > mid-point of each interval, or even at the lower limit! Although I can > understand plotting graphs at the mid-point for ease of representation, > this hardly seems suited to making interpolations. This is because > when you read off the graph at the upper limit of a given interval, you > will (probably) have more cases than fell up to and including the > interval itself. This is surely absurd, yet people seem to seriously > believe it is a viable alternative. > > I'm really hoping for a good reference on this (preferably by a highly > regarded author to make the case stronger :). Any comments? Any nice > references? > > Thanks! > > Steve. > > > Sent via Deja.com http://www.deja.com/ > Before you buy. > > > === > This list is open to everyone. Occasionally, less thoughtful > people send inappropriate messages. Please DO NOT COMPLAIN TO > THE POSTMASTER about these messages because the postmaster has no > way of controlling them, and excessive complaints will result in > termination of the list. > > For information about this list, including information about the > problem of inappropriate messages and information about how to > unsubscribe, please see the web page at > http://jse.stat.ncsu.edu/ > === > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
bad graphs
i spotted this ... http://www.sa.psu.edu/sara/pulse/bookstore.html about 1/2 way down the page ... see the graph titled "Penn State Bookstore Support for Activities" ... should these dots be connected? Dennis Roberts, EdPsy, Penn State University 208 Cedar Bldg., University Park PA 16802 Email: [EMAIL PROTECTED], AC 814-863-2401, FAX 814-863-1002 WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm FRAMES: http://roberts.ed.psu.edu/users/droberts/drframe.htm === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Least squares Was: Re: what is s.d.?
At 12:41 PM 5/22/00 -0500, Herman Rubin wrote: (in response to bob hayden's note) >As for outliers, the appropriate meaning for them is that they >are observations which are incorrect, or for which the assumptions >of the model are invalid. Those should be removed, as should >any others of that type. i think this is either worded incorrectly or ... incorrect ... outliers might be observations that don't fit the model ... but, that does not make the observations incorrect ... you should only remove the observations IF you have some concurrent information that the values indeed ... ARE incorrect data points ... for some specific reason(s) ... MIScalculated ... entered wrong ... etc. just because they don't look nice (according to some model) is not good enough > >-- >This address is for information only. I do not claim that these views >are those of the Statistics Department or of Purdue University. >Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 >[EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 > > >=== >This list is open to everyone. Occasionally, less thoughtful >people send inappropriate messages. Please DO NOT COMPLAIN TO >THE POSTMASTER about these messages because the postmaster has no >way of controlling them, and excessive complaints will result in >termination of the list. > >For information about this list, including information about the >problem of inappropriate messages and information about how to >unsubscribe, please see the web page at >http://jse.stat.ncsu.edu/ >=== Dennis Roberts, EdPsy, Penn State University 208 Cedar Bldg., University Park PA 16802 Email: [EMAIL PROTECTED], AC 814-863-2401, FAX 814-863-1002 WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm FRAMES: http://roberts.ed.psu.edu/users/droberts/drframe.htm === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Square root transformation
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] says... > >We use multiple linear regression to perform our analyses. Because we >work with binned data (discharge frequency of a neuron) which follow a >non-normal (Poisson) distribution, we typically use the square root >transform on the dependent variable (discharge rate of the neuron). >(Actually, the transformation is sqrt(spike rate + 3/8) ) > >I've been trying to show that some independent variables account for >more of the variance explained in the dependent variable. However, some >researchers in my field argue that the square root transform could >artificially bias my results so that some independent variable account >for more of the variance than they really should. I don't see how this >could be from a theoretical level. Plus, I've run the multiple >regression without the transform and seen only about a 5% difference >(not much). > >Does anybody know if these criticisms have any theoretical merit? I >can't see how this can be so. I thought that the square-root transform >was a pretty sound way of reducing your chance of biasing the analysis >if the data is non-normal (which most parametric tests require). > >Thanks. >-Tony > > >-- >/// >// G. Anthony Reina, MD // >// The Neurosciences Institute // >// 10640 John Jay Hopkins Drive // >// San Diego, CA 92121// >// Phone: (858) 626-2132 // >// FAX: (858) 626-2199 // > You can try a straight Poisson regression. If the conclusions you obtained from a Poisson regression are consistent with those from a square-root transformation, you'd be OK. The main purpose of a square-root transform in your case is to stabilize the variance of the error terms. -- T.S. Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com __ Get paid to write a review! http://recursive-partitioning.epinions.com === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: sas vs s-plus for qc
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] says... > >Check out Minitab Release 13. This is the software used by most of the Six >Sigma Black Belt companies. It has very strong DOE, SPC, Process Capability, >and Measurement System Analysis tools. Also, make sure you take a look at >their help tools (the manuals, on-line help, real-time tutorials, and their >new statguide) - it is without a doubt best in class. I wouldn't put Minitab in the same class as SAS and S-Plus. Minitab belongs to a class below that for SAS, S-Plus, SPSS. IMO, Minitab is still good only for teaching purposes. Professional data analysts don't use Minitab. >Also, don't underestimate the fact that Minitab Inc. has essentially one >product: Minitab. Their support isn't watered down by a myriad of modules >and other software (such as SPSS and SAS). > >You can download a full working copy (limited to 30 days of use) at >http://www.minitab.com > >(I don't work for Minitab or have any connection with them except for being >an extremely satisfied customer) >Patrick Lee wrote: > >> Dear fellow newsgroupers; >> I am trying to find suitable software for quality control analysis that >> my manager is about to conduct. I had not used SAS/QC software but have >> used S-Plus for graphics and find >> that S-Plus is quicker for graphics. I understand that S-Plus has a DOX >> module and was >> wondering if anyone had experiences, good or bad, with this software. I >> was also >> wondering if there are any good specialized software for QC or DOX >> analysis other than >> SAS or S-Plus. Thanks in advance. >> >> Patrick Lee -- T.S. Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com __ Get paid to write a review! http://recursive-partitioning.epinions.com === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Distribution Free Tolerance Limits
The Subject was written as "Distribution Free Tolerance Limits." - here was the statement, "We're doing some research in statistical classification of abnormalities in retinal images (that is, pattern recognization), and we need to estimate the size of the sample nescesary. We've heard of some tables for this purpose and would like to know if someone knows where to find them (or something similar)." Tolerance limits? Are limits wanted in order to reject a bunch of possible outliers? or to select the outliers? And you are asking for the "size of the sample necessary" -- in order to achieve what end? I guess I also suspect that I want to frame the problem as something other than what I consider "tolerance limits", and I don't know to what extent the question ought to be serious about "distribution free." On 19 May 2000 00:02:52 -0700, [EMAIL PROTECTED] wrote: > Chebycheff's Inequality redivivus! > - Right - that is an ultimate distribution-free result, if I remember correctly, which only requires finite variance. (Not necessarity unimodal.)(And in practice, you ought to do a whole lot better, right? ...) I don't know what those texts may be giving - > Tables of the tolerance factors may be found in the following two > venerable texts. (They can also be calculated from the inequality with a > number of numerical analysis packages for the Mac or the PC) > > Engineering Statistics, 2nd Edition; Bowker and Lieberman, Prentice Hall > Introduction to Statistical Analysis, 4th Edition, Dixon and Massey, > McGraw Hill < snip > -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Square root transformation
We use multiple linear regression to perform our analyses. Because we work with binned data (discharge frequency of a neuron) which follow a non-normal (Poisson) distribution, we typically use the square root transform on the dependent variable (discharge rate of the neuron). (Actually, the transformation is sqrt(spike rate + 3/8) ) I've been trying to show that some independent variables account for more of the variance explained in the dependent variable. However, some researchers in my field argue that the square root transform could artificially bias my results so that some independent variable account for more of the variance than they really should. I don't see how this could be from a theoretical level. Plus, I've run the multiple regression without the transform and seen only about a 5% difference (not much). Does anybody know if these criticisms have any theoretical merit? I can't see how this can be so. I thought that the square-root transform was a pretty sound way of reducing your chance of biasing the analysis if the data is non-normal (which most parametric tests require). Thanks. -Tony -- /// // G. Anthony Reina, MD // // The Neurosciences Institute // // 10640 John Jay Hopkins Drive // // San Diego, CA 92121// // Phone: (858) 626-2132 // // FAX: (858) 626-2199 // === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: non normal multivariate outlier detection
In article <8gal3d$a0e$[EMAIL PROTECTED]>, Manuel Castejon Limas <[EMAIL PROTECTED]> wrote: >Dear people, >Im looking for outlier detection methods in non normal multivariate >distributions. >Any help would be appreciated. The idea of an outlier depends heavily on the distribution; there is no such thing as an absolute outlier. The purpose of an detecting an outlier is that it is an incorrect or spurious observation, and including it would therefore be likely to give incorrect results. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Least squares Was: Re: what is s.d.?
In article <[EMAIL PROTECTED]>, Bob Hayden <[EMAIL PROTECTED]> wrote: >Least squares methods are in some sense optimal when the "errors" >estimated by the residuals are normally distributed. They are >questionable when the errors are multimodal, strongly skewed, or >afflicted with outliers. Least squares is not optimal without such conditions. It is valid under much weaker assumptions; the Gauss-Markov theorem does not care if the errors are multimodal or strongly skewed. In such cases, so-called robust procedures like least absolute value are likely to be invalid. If the dependent variable is linear in the "independent variables" (not necessarily functionally independent) of the model linear in the parameters, and the errors are uncorrelated with the independent variables, least squares is valid; with more assumptions, one might do better. As for outliers, the appropriate meaning for them is that they are observations which are incorrect, or for which the assumptions of the model are invalid. Those should be removed, as should any others of that type. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Ann: Fortran2000.com -- All About Fortran
I wrote: > I tried, but there was "no response from server". It's working for me now. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: obsolete methods?
At 09:14 PM 5/22/00 +0800, Stephen Humphry wrote: >It doesn't offer guidance up front exactly, no, but it provides feedback on >whether items work, and an important (imv) conceptual framework for test >construction. For example, if you have the Rasch model in mind, you look to >developing items of a range of difficutly (or 'affective intensity'). You >wouldn't necessarily think to do this if you were only using other >techniques, yet it is surely important. Take the extreme example of a test >in which every item is of the same affective intensity -- say for >'satisfaction with your bank'. Everyone who is higher than a certain >satisfaction level would be expected to agree with all items, whereas >everyone below a certain satisfaction would be expected to disagree (of >course, this probably won't happen but it in reality you may get something >approaching this situation). first, items don't have intensity ... people do in response TO an item ... second, just because (to use an analogy) someone scores (say on a 30 item test) high on the test does not mean that they got all items right nor, would we expect them to ... so, just because someone has a fairly strong + feeling towards a bank does not mean that they agree with (nor would we expect them to) all the practices of the bank ... scale scores (not even from a rasch developed scale) are not a true guttman scale certainly though ... one does not need the rasch model to detect these tendencies ... Dennis Roberts, EdPsy, Penn State University 208 Cedar Bldg., University Park PA 16802 Email: [EMAIL PROTECTED], AC 814-863-2401, FAX 814-863-1002 WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm FRAMES: http://roberts.ed.psu.edu/users/droberts/drframe.htm === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
what is s.d.?
The standard deviation of a single batch of numbers is a typical value for the residuals (deviations from the mean). If you divide by n, it is the RMS mean of the residuals. You can check your calculation of the s.d. by comparing it to the residuals. The mean is the measure of center that minimizes the sum of the squared residuals, so the s.d. is the measure of variability that goes with the mean in particular, and with least squares in general. For simple linear regression, s is a typical value for the residuals (deviations from the regression line). For multiple regression, s is a typical value for the residuals (deviations from the model). There's a pattern here!-) Least squares methods are in some sense optimal when the "errors" estimated by the residuals are normally distributed. They are questionable when the errors are multimodal, strongly skewed, or afflicted with outliers. _ | | Robert W. Hayden | | Department of Mathematics / | Plymouth State College MSC#29 | | Plymouth, New Hampshire 03264 USA | * | 82 River Street /| Ashland, NH 03217-9702 | ) (603) 968-9914 (home) L_/ fax (603) 535-2943 (work) [EMAIL PROTECTED] http://mathpc04.plymouth.edu === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Cumulative Frequency Polygons a right way?
Steve, Your interpretation is right because the coordinates of the ogive (graph of cumulative frequency/ relative cumulative frequency)) indicate " less than the upper limit". Jin Jineshwar Singh, Coordinator, IDS Interdisciplinary Department George Brown College St .James campus [EMAIL PROTECTED] * You cannot control how others act but you can control how you react. 416 -415-2089 http://www.gbrownc.on.ca/~jsingh - Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, May 22, 2000 2:08 AM Subject: Cumulative Frequency Polygons a right way? > Hi all, > > First up, the purpose I have at hand is to make interpolations for > percentages of students who have achieved above a certain score on a > test (where this score may lie between two discrete score points on the > scale). > > It seems to me cumulative frequencies should be plotted at the exact > upper limit of each interval. This is the only simple method that > makes sense to me. > > However, it has been suggested by others in the context I'm dealing > with that frequencies/percentages can alternatively be plotted at the > mid-point of each interval, or even at the lower limit! Although I can > understand plotting graphs at the mid-point for ease of representation, > this hardly seems suited to making interpolations. This is because > when you read off the graph at the upper limit of a given interval, you > will (probably) have more cases than fell up to and including the > interval itself. This is surely absurd, yet people seem to seriously > believe it is a viable alternative. > > I'm really hoping for a good reference on this (preferably by a highly > regarded author to make the case stronger :). Any comments, or refs? > > Thanks! > > Steve. > > > Sent via Deja.com http://www.deja.com/ > Before you buy. > > > === > This list is open to everyone. Occasionally, less thoughtful > people send inappropriate messages. Please DO NOT COMPLAIN TO > THE POSTMASTER about these messages because the postmaster has no > way of controlling them, and excessive complaints will result in > termination of the list. > > For information about this list, including information about the > problem of inappropriate messages and information about how to > unsubscribe, please see the web page at > http://jse.stat.ncsu.edu/ > === > === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: sas vs s-plus for qc
Check out Minitab Release 13. This is the software used by most of the Six Sigma Black Belt companies. It has very strong DOE, SPC, Process Capability, and Measurement System Analysis tools. Also, make sure you take a look at their help tools (the manuals, on-line help, real-time tutorials, and their new statguide) - it is without a doubt best in class. Also, don't underestimate the fact that Minitab Inc. has essentially one product: Minitab. Their support isn't watered down by a myriad of modules and other software (such as SPSS and SAS). You can download a full working copy (limited to 30 days of use) at http://www.minitab.com (I don't work for Minitab or have any connection with them except for being an extremely satisfied customer) Patrick Lee wrote: > Dear fellow newsgroupers; > I am trying to find suitable software for quality control analysis that > my manager is about to conduct. I had not used SAS/QC software but have > used S-Plus for graphics and find > that S-Plus is quicker for graphics. I understand that S-Plus has a DOX > module and was > wondering if anyone had experiences, good or bad, with this software. I > was also > wondering if there are any good specialized software for QC or DOX > analysis other than > SAS or S-Plus. Thanks in advance. > > Patrick Lee === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: obsolete methods?
> This is all fine, but please remember that Rasch is essentially a > sophisticated (and much more thoughtful) mathematical model for > describing the properties of items and people; it offers no guidance on > how to write items for an attitude measurement scale. One still has to > define constructs, write items and design an appropriate response mode. It doesn't offer guidance up front exactly, no, but it provides feedback on whether items work, and an important (imv) conceptual framework for test construction. For example, if you have the Rasch model in mind, you look to developing items of a range of difficutly (or 'affective intensity'). You wouldn't necessarily think to do this if you were only using other techniques, yet it is surely important. Take the extreme example of a test in which every item is of the same affective intensity -- say for 'satisfaction with your bank'. Everyone who is higher than a certain satisfaction level would be expected to agree with all items, whereas everyone below a certain satisfaction would be expected to disagree (of course, this probably won't happen but it in reality you may get something approaching this situation). In this case, the instrument will not effectively discriminate between a person somewhat lower on satisfaction than all your items are targeted toward (eg lower than the level of satisfaction needed to just agree with a certain statement), versus someone far less satisfied than that again (both will simply disagree with all or most statements). Conversely, if you have a set of items which target a range of satisfaction levels, you expect different scores for most people on the test, dependent upon their particular level of satisfaction (roughly in keeping with a Guttman structure). Surely that's what you should be after! It also tells you whether categories on Likert scales function well or not. For example, 'neutral' categories don't typically work very well (mind you, I can't give empirical evidence, this is just what people have found in experience, including myself). Looking at which items fit and which don't obviously provides critical information about the nature of the construct itself. Via feedback from these sorts of things, you certainly get an idea of what kinds of items and response modes effectively elicit responses indicative of a latent trait. That is, responses governed stochastically by item 'difficulty' and person 'ability' (or affective intensity and satisfaction). > Rasch provides a mathematical rationale for selecting items for > inclusion in a scale, using the criterion of "fit to the model". I > don't claim great expertise here, but when I ran an attitude scale > through a Rasch analysis and a traditional item analysis/factor analysis > (many years ago), the decisions reached about which items to include or > exclude were not too different. > Sure, this may be the case. Not necessarily though. I have tried the same on a couple of occasions and found that the decisions were quite different based on Rasch analysis vs Factor Analysis. This would in fact have been expected given the Rasch analysis because the Likert categories did not effectively discriminate with respect to the latent trait. Correlational techniques obviously rely upon having roughly equal intervals between score points. > > I regard Rasch as a synthesis of the Thurstone and Likert techniques. > Thurstone placed much emphasis on item calibration, getting large > numbers of judges to rate where items were located on a supposedly > interval scale, but used only a small number of items to measure > individuals' attitudes. Likert placed much emphasis on person > measurement, using a large number of items to measure people's > attitudes, but placed less emphasis on the calibration of item > properties. Rasch places equal emphasis on person measurement and item > calibration, and uses a common measurement scale for both. However, > bear in mind that all are psychometric methods which attempt to measure > attitudes by producing a scale score. I took the original question that > started off this thread to wonder about psychometric methods were > obsolete, and not whether Likert and Thurstone had been replaced by > better mathematical models. > > Paul Gardner Rasch measurement is in essence equivalent to Thurstone's law of comparative judgement except that (a) the person parameter is substituted for one of the item parameters, and (b) the logistic function is substituted for the normal. It is based on the same logic but the above trick allows separation of person and item parameters. Yes, all are methods with attempt to measure attitudes (or whatever) by producing a score. However, Rasch uses a non-linear 'transformation' of the raw score so is fundamentally different from Likert's approach (not Thurstone's of course, in that respect). On your last statment above, many would argue that these psychometric methods are obsolete precisely becau
Re: Cumulative Frequency Polygons a right way?
a cumulative frequency is up to SOME point ... the problem is, WHAT is the point does it include THE point? i don't really see much (if any) difference between (say we have a score scale that goes up to 50 and, 1 point is given for each valid response) saying we have accumulated 53% to a score of 38 ... or, the upper limit of 38.5 ... or for that matter, anywhere between 38 and (but not quite) 39 ... people can't get scores of decimal values anyway an upper limit, by definition, is always a value that can't be achieved At 06:08 AM 5/22/00 +, [EMAIL PROTECTED] wrote: >Hi all, > >First up, the purpose I have at hand is to make interpolations for >percentages of students who have achieved above a certain score on a >test (where this score may lie between two discrete score points on the >scale). > >It seems to me cumulative frequencies should be plotted at the exact >upper limit of each interval. This is the only simple method that >makes sense to me. > >However, it has been suggested by others in the context I'm dealing >with that frequencies/percentages can alternatively be plotted at the >mid-point of each interval, or even at the lower limit! Although I can >understand plotting graphs at the mid-point for ease of representation, >this hardly seems suited to making interpolations. This is because >when you read off the graph at the upper limit of a given interval, you >will (probably) have more cases than fell up to and including the >interval itself. This is surely absurd, yet people seem to seriously >believe it is a viable alternative. > >I'm really hoping for a good reference on this (preferably by a highly >regarded author to make the case stronger :). Any comments, or refs? > >Thanks! > >Steve. > > >Sent via Deja.com http://www.deja.com/ >Before you buy. > > >=== >This list is open to everyone. Occasionally, less thoughtful >people send inappropriate messages. Please DO NOT COMPLAIN TO >THE POSTMASTER about these messages because the postmaster has no >way of controlling them, and excessive complaints will result in >termination of the list. > >For information about this list, including information about the >problem of inappropriate messages and information about how to >unsubscribe, please see the web page at >http://jse.stat.ncsu.edu/ >=== Dennis Roberts, EdPsy, Penn State University 208 Cedar Bldg., University Park PA 16802 Email: [EMAIL PROTECTED], AC 814-863-2401, FAX 814-863-1002 WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm FRAMES: http://roberts.ed.psu.edu/users/droberts/drframe.htm === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: non normal multivariate outlier detection
Hello Manuel, I think a good place to start is Barnett, V and Lewis, T, Outliers in statistical data. rick --- "Manuel Castejon Limas" wrote: Dear people, I¥m looking for outlier detection methods in non normal multivariate distributions. Any help would be appreciated. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === --- end of quote --- === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Cumulative Frequency Polygons a right way?
On Mon, 22 May 2000 [EMAIL PROTECTED] wrote: > First up, the purpose I have at hand is to make interpolations for > percentages of students who have achieved above a certain score on a > test (where this score may lie between two discrete score points on the > scale). One might inquire, if one were pursuing this matter in a little more depth, why one would not prefer a continuous approximating distribution (e.g., normal, if that be appropriate, as is often the case), on the basis either that the empirical CFs at hand represent an instance drawn from such an idealized population, or that the continous function is an adequate approximation to the true population distribution; since the purpose you describe clearly is to apply the CF information to some (hypothetical?) set of students whose scores are not in fact represented in the data in hand. (Of course, the problem you describe below still arises, in terms of how one converts from the discrete empirical CF function to the (idealized?) continuous function; this is much less a problem if the continuous function is obtained from information other than the CFs themselves -- e.g., an approximating normal distribution would be derived from the empirical mean and standard deviation, not from the empirical CFs.) > It seems to me cumulative frequencies should be plotted at the exact > upper limit of each interval. This is the only simple method that > makes sense to me. If by "cumulative frequency" ("CF" above) you mean "observed frequency of responses less than or equal to this score value", and especially if these CFs have been cumulated over a grouped empirical frequency distribution, your logic is impeccable. If you've been cumulating at the level of individual score values, there may be room for SOME quibbling. > However, it has been suggested by others in the context Im dealing > with that frequencies/percentages can alternatively be plotted at the > mid-point of each interval, or even at the lower limit! Although I can > understand plotting graphs at the mid-point for ease of representation, > this hardly seems suited to making interpolations. This is because > when you read off the graph at the upper limit of a given interval, you > will (probably) have more cases than fell up to and including the > interval itself. This is surely absurd, yet people seem to seriously > believe it is a viable alternative. First, make sure you're all on the same wavelength. You clearly are thinking in terms of "<=" CFs; plotting at the lower limit would be appropriate for "strictly <" CFs (or equivalently ">=" CFs). Plotting at the midpoint would be reasonable if one took for one's CF the midpoint between a "strictly <" CF and a "<=" CF. If upon examination it turns out that your colleagues (?) really think they're dealing with "<<=" CFs: You might ask them how they view the two intervals at the extreme ends of the CFs. In terms of relative cumulative percents (C%s), what scores then apply to the upper and lower limits of (1) the lowest non-empty score interval; (2) the highest score interval? And in particular, what C% applies to the upper limit of the highest interval? Either of the two alternatives you report implies a C% > 100% here, which ought to be absurd enough for anyone with a decent grasp of reality. Another approach is to inquire how one would arrange a CF downward -- i.e., where the C%s range from 0 at the maximum value to 100% at the minimum, and the CFs represent the frequency of responses greater than or equal to this score value. > Im really hoping for a good reference on this (preferably by a highly > regarded author to make the case stronger :). Any comments, or refs? Sorry, can't help you here, I don't think. It has not been my habit to invoke appeals to the Irrelevant Authorities at Headquarters, nor am I much impressed by such appeals. If the authorities invoked are in fact relevant, they have logical arguments on their side, and the logical arguments are what one needs, not the name(s) of the authorities. Of course, if you're dealing with folks who DON'T have a decent grasp of reality, irrelevant authorities may be a surprisingly effective part of one's armamentarium. In this case, look for any standard introductory statistics texts that deal in detail with CFs, which probably means texts three decades old or more (your local university library should have an adequate assortment), and pick one whose author(s) happen to be well-known in the field in which these folks think they operate. (But make sure the authors' logic is correct!) -- DFB. Donald F. Burrill [EMAIL PROTECTED] 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED] MSC #29, Plymouth, NH 03264
Re: What is standard deviation exactly?
On Mon, 22 May 2000 13:24:25 +1000, "Glen Barnett" <[EMAIL PROTECTED]> wrote: >I assume you're talking about sample standard deviations, >not population standard deviations (though interpretation >of what it represents is similar). > > ... > >Note that the standard deviation can't exceed half the range >(largest value minus smallest value). That's true for the n denominator ("population standard deviation"), but not for n-1 ("sample standard deviation"). For example, if your sample is just the two points 0 and 1, the sample standard deviation is 0.71, and the range is 1. Duncan Murdoch === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Signal detection: signal, noise and a 2nd signal?
Sounds just perfect. Thanks for taking the time! [EMAIL PROTECTED] wrote: >The question of discriminating among three or more events has been >successfully tackled by Brian Scurfield. He extended typical >two-event ROC analysis to n-event ROC analysis (n>2), where results >are expressed as n-dimentional ROC hypersurfaces, and sensitivity can >be understood in terms of hypervolumes under the hypersurfaces. He >also developed a new type of distribution-free sensitivity measure >based on an information theory analysis of n-event discrimination >tasks. The measure gives an overall measure of detectability among n >events, and also allows sensible comparisons to be made between >n-event tasks and (n-1)-event tasks, say. > >Scurfield illustrated his findings using the 3-event case, so if >you're specifically interested in that case, check out his papers: > >Scurfield, B.K. (1996) "Multiple-event forced-choice tasks in the >theory of signal detectability", Journal of Mathematical Psychology, >40(3), 253-269 > >Scurfield, B.K. (1998) "Generalization of the theory of signal >detectability to m-dimensional n-event forced-choice tasks", Journal >of Mathematical Psychology, 42(1), 5-31. > >The JMP abstracts used to be available online, but I don't know if >they still are. > >Also, there was an independent development of some of this material by >Douglas Mossman. He had a paper in Medical Decision Making in either >1998 or 1999 entitled "Three-way ROCs". Sorry, can't remember the >volume. > > >Hope this helps, > >Vit D. > > -- Manni === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Cumulative Frequency Polygons a right way?
Hi all, First up, the purpose I have at hand is to make interpolations for percentages of students who have achieved above a certain score on a test (where this score may lie between two discrete score points on the scale). It seems to me cumulative frequencies should be plotted at the exact upper limit of each interval. This is the only simple method that makes sense to me. However, it has been suggested by others in the context Im dealing with that frequencies/percentages can alternatively be plotted at the mid-point of each interval, or even at the lower limit! Although I can understand plotting graphs at the mid-point for ease of representation, this hardly seems suited to making interpolations. This is because when you read off the graph at the upper limit of a given interval, you will (probably) have more cases than fell up to and including the interval itself. This is surely absurd, yet people seem to seriously believe it is a viable alternative. Im really hoping for a good reference on this (preferably by a highly regarded author to make the case stronger :). Any comments? Any nice references? Thanks! Steve. Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: obsolete methods?
[EMAIL PROTECTED] wrote: > > In a way, yes, they were superseded. There is a school of thought now > in which the proponents would argue (imv successfully) that the > approach youve outlined culminated in Raschs Simple Logistic Model. > Over and above the benefits of the Thurstones comparative judgements, > the Rasch model allows you to place person parameters and item > parameters on the one metric, and to eliminate person parameters in the > estimation of item parameters, and vice versa. Most importantly, > according to proponents, the model allows you to achieve the > requirements of fundamental measurement (including conjoint additivity > and invariance --I can give some explanation and/or quotes if you like) > provided a reasonable fit of data to the model. > > The Rasch model is a stochastic one in which responses are said to be > governed only by a person parameter and an item parameter. While > Thurstone used the normal curve in his law of comparative judgement, > Rasch used the logistic approximation (very close of course). Tthis > allowed him to separate person and item parameters a very significant > achievement imv. In turn, he called the outcome specific > objectivity. That is, the estimates of item parameters are > independent of the particular set of persons used to derive them, and > the estimates of person parameters are independent of the particular > set of items used to derive them (this is algebraic fact under the > model, then the point is do the data fit the model?). Such > objectivity is key in the physical sciences (I have a quote in which > Andrich (see below) shows how this situation applies for a = f/m, > whereby a comparison of accelerations is independent of the force that > is instrumental in causing them). > > There are various sources of information on this. Try www.rasch.org/ > for some discussion of the properties of the Rasch model, applications, > and various other things. Or there is Rasch Models for Measurement > by David Andrich. But there are various refs on the above website. If > you want to know anything more, just ask, and Ill help if I can > though Im relatively new to Rasch myself (Ive done courses with David > Andrich, who developed the Extended Logistic Model for use with > Likert scale data rather than dichotomous data, and himself trained > with Rasch for a time). > This is all fine, but please remember that Rasch is essentially a sophisticated (and much more thoughtful) mathematical model for describing the properties of items and people; it offers no guidance on how to write items for an attitude measurement scale. One still has to define constructs, write items and design an appropriate response mode. Rasch provides a mathematical rationale for selecting items for inclusion in a scale, using the criterion of "fit to the model". I don't claim great expertise here, but when I ran an attitude scale through a Rasch analysis and a traditional item analysis/factor analysis (many years ago), the decisions reached about which items to include or exclude were not too different. I regard Rasch as a synthesis of the Thurstone and Likert techniques. Thurstone placed much emphasis on item calibration, getting large numbers of judges to rate where items were located on a supposedly interval scale, but used only a small number of items to measure individuals' attitudes. Likert placed much emphasis on person measurement, using a large number of items to measure people's attitudes, but placed less emphasis on the calibration of item properties. Rasch places equal emphasis on person measurement and item calibration, and uses a common measurement scale for both. However, bear in mind that all are psychometric methods which attempt to measure attitudes by producing a scale score. I took the original question that started off this thread to wonder about psychometric methods were obsolete, and not whether Likert and Thurstone had been replaced by better mathematical models. Paul Gardner begin:vcard n:Gardner;Dr Paul tel;cell:0412 275 623 tel;fax:Int + 61 3 9905 2779 (Faculty office) tel;home:Int + 61 3 9578 4724 tel;work:Int + 61 3 9905 2854 x-mozilla-html:FALSE adr:;; version:2.1 email;internet:[EMAIL PROTECTED] x-mozilla-cpt:;-29488 fn:Dr Paul Gardner, Reader in Education and Director, Research Degrees, Faculty of Education, Monash University, Vic. Australia 3800 end:vcard
Re: What is standard deviation exactly?
Glen Barnett wrote: > > In article <[EMAIL PROTECTED]>, > Neil <[EMAIL PROTECTED]> wrote: > >I was wondering what the standard deviation means exactly? > > > >I've seen the equation, etc., but I don't really understand > >what st dev is and what it is for. > > I'm going to take a different tack to that Herman has taken. > If I tell you what you already know, my apologies. > > I assume you're talking about sample standard deviations, > not population standard deviations (though interpretation > of what it represents is similar). > > Standard deviation is an attempt to measure how "spread out" > the values are - big standard deviation means more spread out, > small standard deviation means closer together. A standard > deviation of zero means all the values are the same. > > Note that the standard deviation can't exceed half the range > (largest value minus smallest value). > > Standard deviation is measured in the original units. For example, > if you record a set of lengths in mm, their standard deviation is in mm. > > There is a huge variety of reasonable measures of spread. > Standard deviation is the most used. You will get more of > a feel for the standard deviation if you compare what it > does to some other measures of spread. > > For example, another common measure is the mean deviation - > the average distance of observations from the mean. By contrast, > standard deviation is the root-mean-square distance from the mean > (as you can see from the formula**). > > ** At least the n-denominator (maximum likelihood) version is the > root-mean-square deviation; the n-1 denominator is just a constant > times that. > > This squaring puts relatively more weight on the larger deviations, > and less weight on the smaller deviations than the mean deviation, > but it is still a kind of weighted average of the deviations from the > mean. > > Here's a quick (tiny) example to help illustrate some of the points > (I am using the n-1 version of the standard deviation here): > > Sample 1: 4, 6, 7, 7, 8, 10 > Mean = 7, mean deviation = 4/3 = 1.333..., std deviation=2 > > Sample 2: 1, 5, 7, 7, 9, 13 > Mean = 7, mean deviation = 8/3 = 2.666..., std deviation =4 > > Note that Sample 2's values are more 'spread out' than sample 1's, > and both of the measures of spread tell us that. > > Standard deviation is used for a variety of reasons - including the > fact that it is the square root of the variance, and variance has > some nice properties, both in general and also particularly for > normal r.v.'s, but s.d. is measured in original units. > > Glen > This is a useful summary: I'd just like to add one point to it. People sometimes ask, which measure of spread is "best"? Or, why use standard deviation, it seems more complicated than simpler statistics such as mean average deviation. Various measures of spread are useful for different purposes, but the real strength of s.d. is that many other statistical concepts are built upon it. Thus s.d. underpins the notion of a standard (z) score, z score underpins the definition of Pearson product-moment correlation, and hence linear regression; s.d. squared is variance, and this underpins the variance theorem, analysis of variance, F-ratio etc. etc. Thus it's a "big idea", a substantive concept in the structure of statistics, in a way that other measures of spread aren't. There are parallels to this in other branches of science and mathematics. Mass times velocity (momentum) is a useful concept, because it enters into relationships with other concepts. So does (1/2)m v-squared (kinetic energy). But no one uses mass per unit velocity, or mass times the square root of velocity, or m v-cubed, because (as far as I know) these concepts don't enter into any relationships which are useful for describing aspects of the world. Paul Gardner begin:vcard n:Gardner;Dr Paul tel;cell:0412 275 623 tel;fax:Int + 61 3 9905 2779 (Faculty office) tel;home:Int + 61 3 9578 4724 tel;work:Int + 61 3 9905 2854 x-mozilla-html:FALSE adr:;; version:2.1 email;internet:[EMAIL PROTECTED] x-mozilla-cpt:;-29488 fn:Dr Paul Gardner, Reader in Education and Director, Research Degrees, Faculty of Education, Monash University, Vic. Australia 3800 end:vcard
non balanced MANOVA
Dear people, I am interested in knowing about the different ways people manage the bad behaviour of MANOVA as soon as the algorithm is used with irregular number of samples in each class. Any help would be appreciated. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
non normal multivariate outlier detection
Dear people, I´m looking for outlier detection methods in non normal multivariate distributions. Any help would be appreciated. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: obsolete methods?
In a way, yes, they were superseded. There is a school of thought now in which the proponents would argue (imv successfully) that the approach youve outlined culminated in Raschs Simple Logistic Model. Over and above the benefits of the Thurstones comparative judgements, the Rasch model allows you to place person parameters and item parameters on the one metric, and to eliminate person parameters in the estimation of item parameters, and vice versa. Most importantly, according to proponents, the model allows you to achieve the requirements of fundamental measurement (including conjoint additivity and invariance --I can give some explanation and/or quotes if you like) provided a reasonable fit of data to the model. The Rasch model is a stochastic one in which responses are said to be governed only by a person parameter and an item parameter. While Thurstone used the normal curve in his law of comparative judgement, Rasch used the logistic approximation (very close of course). Tthis allowed him to separate person and item parameters a very significant achievement imv. In turn, he called the outcome specific objectivity. That is, the estimates of item parameters are independent of the particular set of persons used to derive them, and the estimates of person parameters are independent of the particular set of items used to derive them (this is algebraic fact under the model, then the point is do the data fit the model?). Such objectivity is key in the physical sciences (I have a quote in which Andrich (see below) shows how this situation applies for a = f/m, whereby a comparison of accelerations is independent of the force that is instrumental in causing them). There are various sources of information on this. Try www.rasch.org/ for some discussion of the properties of the Rasch model, applications, and various other things. Or there is Rasch Models for Measurement by David Andrich. But there are various refs on the above website. If you want to know anything more, just ask, and Ill help if I can though Im relatively new to Rasch myself (Ive done courses with David Andrich, who developed the Extended Logistic Model for use with Likert scale data rather than dichotomous data, and himself trained with Rasch for a time). Take care, Steve Sent via Deja.com http://www.deja.com/ Before you buy. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===