July Sale on Toner Cartridges!!!
D J Printing Corporation 2564 Cochise Drive Acworth, GA 30102 770-974-8228 [EMAIL PROTECTED] --LASER, FAX AND COPIER PRINTER TONER CARTRIDGES-- *WE ACCEPT GOVERNMENT, SCHOOL AND UNIVERSITY PURCHASE ORDERS* ***FREE SHIPPING WITH ANY ORDER OF $200 OR MORE!!!*** APPLE LASER WRITER SELECT 300/310/360 $60 LASER WRITER PRO 600/630 OR 16/600$60 LASER WRITER 300/320 OR 4/600 $45 LASER WRITER LS/NT/NTR/SC $50 LASER WRITER 2NT/2NTX/2SC/2F/2G $50 LASER WRITER 12/640$60 HEWLETT PACKARD LASERJET SERIES 1100/1100A (92A) $40 LASERJET SERIES 2100/SE/XI/M/TN (96A) $70 LASERJET SERIES 2/2D/3/3D (95A) $43 LASERJET SERIES 2P/2P+/3P (75A) $55 LASERJET SERIES 3SI/4SI (91A) $75 LASERJET SERIES 4/4M/4+/4M+/5/5M/5N (98A) $55 LASERJET SERIES 4L/4ML/4P/4MP (74A) $40 LASERJET SERIES 4000/T/N/TN (27X-HIGH YIELD) $70 LASERJET SERIES 4V/4MV $80 LASERJET SERIES 5000 (29X) $95 LASERJET SERIES 5L/6L $39 LASERJET SERIES 5P/5MP/6P/6MP $50 LASERJET SERIES 5SI/5SI MX/5SI MOPIER/8000$85 LASERJET SERIES 8100/N/DN (82X) $115 HEWLETT PACKARD LASERFAX LASERFAX 500/700, FX1 $50 LASERFAX 5000/7000, FX2 $50 LASERFAX FX3 $60 LASERFAX FX4 $65 LEXMARK OPTRA 4019, 4029 HIGH YIELD $130 OPTRA R, 4039, 4049 HIGH YIELD$135 OPTRA S, 4059 HIGH YIELD $135 OPTRA N $110 EPSON LASER TONER EPL-7000/7500/8000$95 EPL-1000/1500 $95 EPSON INK JET STYLUS COLOR 440/640/740/760/860 (COLOR) $20 STYLUS COLOR 740/760/860 (BLACK) $20 CANON LBP-430 $45 LBP-460/465 $55 LBP-8 II $50 LBP-LX$54 LBP-NX$90 LBP-AX$49 LBP-EX$59 LBP-SX$49 LBP-BX$90 LBP-PX$49 LBP-WX$90 LBP-VX$59 CANON FAX L700 THRU L790 (FX1)$55 CANON FAX L5000 THRU L7000 (FX2) $55 CANON COPIERS PC 1/2/3/6/6RE/7/8/11/12/65 (A30) $69 PC 210 THRU 780 (E40/E31) $80 PC 300/400 (E20/E16) $80 NEC SERIES 2 LASER MODEL 90/95$100 SUPERSCRIPT 860 $115 PLEASE NOTE: ***FREE SHIPPING WITH ANY ORDER OF $200 OR MORE!!!*** * ALL OF OUR PRICES ARE IN US DOLLARS * WE SHIP UPS GROUND. ADD $6.50 FOR SHIPPING AND HANDLING * WE ACCEPT ALL MAJOR CREDIT CARDS OR COD ORDERS. * COD CHECK ORDERS ADD $3.50 TO YOUR SHIPPING COST. * OUR STANDARD MERCHANDISE REPLACEMENT POLICY IS NET 90 DAYS. * WE DO NOT SELL TO RESELLERS OR BUY FROM DISTRIBUTERS. * WE DO NOT CARRY: BROTHER, MINOLTA, KYOSERA, PANASONIC, XEROX, FUJITSU, OKIDATA OR SHARP PRODUCTS. * WE ALSO DO NOT CARRY: DESKJET OR BUBBLEJET SUPPLIES. * WE DO NOT BUY FROM OR SELL TO RECYCLERS OR REMANUFACTURERS. -PLACE YOUR ORDER AS FOLLOWS- 1) BY PHONE (770) 974-8228 2) BY MAIL: D AND J PRINTING CORPORATION 2564 COCHISE DR ACWORTH, GA 30102 3) BY INTERNET: [EMAIL PROTECTED] INCLUDE THE FOLLOWING INFORMATION WHEN YOU PLACE YOUR ORDER: 1) YOUR PHONE NUMBER 2) COMPANY NAME 3) SHIPPING ADDRESS 4) CONTACT NAME 5) ITEMS NEEDED WITH QUANTITIES 6) METHOD OF PAYMENT (COD OR CREDIT CARD) 7) CREDIT CARD NUMBER WITH EXPIRATION DATE ** IF YOU ARE ORDERING BY
New Opportunity
An excellent opportunity to utilise your technical skills within the broader drug development arena as a: Biostatistician South-East England You will join a major international pharmaceutical company, committed to the development of innovative new therapies for the treatment of respiratory disease. As a key member of the Clinical Development team, you will bring statistical expertise to the design, analysis and interpretation of Phase II to III trials, utilising your knowledge to influence the overall direction of clinical development programmes. To succeed in this challenging role you will have an MSc or PhD in Biostatistics / Statistics, backed by at least 2 years#8217; experience within the pharmaceutical industry or a contract research organisation. An enthusiastic self-starter, you will have the interpersonal skills necessary to succeed within a multi-disciplinary team and to effectively communicate statistical concepts and information to non-statisticians. To apply, send your CV, ideally by e-mail as a Word document, to Dr Kay Wardle at [EMAIL PROTECTED], quoting reference 01156Go. Alternatively, call first for a brief, confidential discussion on 00 44 1707 280815. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: likert scale items
At 07:26 AM 7/25/01 -0400, Teen Assessment Project wrote: I am using a measure with likert scale items. Original psychometrics for the measure included factor analysis to reduce the 100 variables to 20 composites. However, since the variables are not interval, shouldn't non-parametic tests be done to determine group differences (by gender, age, income) on the variables? what were you assuming about the variables when you did a factor analysis on them??? Can I still use the composites...was it appropriate to do the original factor analysis on ordinal data? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
vote counting
In a certain process, there are millions of people voting for thousands of candidates. The top N will be declared winners. But the counting process is flawed and with probability 'p', a vote will be miscounted. (it might be counted for the wrong candidate or it might be counted for a non-existent candidate.) What is the probability that the counted top N will correspond to the real top N? (there are actually two cases here: 1 where I want the order of the top N to be in the correct order and the other where I don't care if the order is correct) Thanks for any ideas, Sanford Lefkowitz = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: vote counting
At 09:33 AM 7/25/01 -0400, Sanford Lefkowitz wrote: In a certain process, there are millions of people voting for thousands of candidates. The top N will be declared winners. But the counting process is flawed and with probability 'p', a vote will be miscounted. (it might be counted for the wrong candidate or it might be counted for a non-existent candidate.) could you elaborate on a real context for something like this? sure, in elections, millions of people vote for thousands of candidates BUT ... winners are not determined by the top N # of votes across the millions ... for example ... in utah ... the winner might have a very SMALL SMALL fraction of millions ... but, in ny state ... a LOSER might have a very LARGE fraction of the millions so, a little more detail about a real context might be helpful = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
output
for a class ... i used an example from moore and mccabe ... a 2 factor anova case ... 4 levels of factor A ... 4 levels of factor B ... completely randomized design ... n=10 in each of the 16 cells now, after the data are stacked so that data are in a column and codes for the two independent variables are in TWO other columns ... it is easy to get a nice graph ... and do the anova which yielded one main effect and a significant interaction graph = 1 page ... anova output = part of 1 page ... now, if you wanted to do some multiple comparisons ... say, the tukey test ... there is an option in the minitab glm command to do this think of it ... 16 means ... all possible comparisons ... and minitab not only produces (which i like) confidence intervals but ... all possible t test statistics ... THAT TOOK AND YIELDED ... 12 pages of output! reading statistical output these days is really complicated due (partly) to THAT ... the volume of possible output becomes huge ... hence, the confusion factor of reading (heaven forbid ... understanding!) what is there drastically increases _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: vote counting
The answers to your questions depend heavily on structural information that you almost certainly don't have, else one would not bother to have arranged a voting process. But consider two very different cases: A. Voters are absolutely indifferent to candidates: that is, all the candidates are equally attractive, or equally preferred by the voters. Then the identity of the candidate with the most votes is purely random, and the probability that the counted top N will correspond to the real top N will be very low indeed (in part because there IS no real top N; but even in the sense that another vote taken tomorrow would be very unlikely to reproduce the same set of top N, let alone in the same order). B. Some candidates are strongly preferred to others (by the voters as a whole, that is, as a population), and exactly N such candidates are so preferred. About the rest the voters are indifferent, on the whole. In these circumstances, one would expect a large difference between the number of votes cast for the least of the N and the number of votes cast for the greatest of the remaining candidates, and the probability that the counted top N will correspond to the real top N would be rather high (depending in part on how large 'p' is). I do not see how to estimate such a probability in the absence of any information about the distribution of preferences. I've assumed that by counting votes you mean that each voter casts exactly one ballot for (at most?) one candidate. For other voting schemes (e.g., vote for K candidates, K .LE. N, and specify one's preferences among them by assigning each candidate a preference from 1 (most favored) to K (least favored)) it is imaginable that answers to your questions might not differ, but showing that to be the case (or not) is another matter entirely. It also occurs to me that a single probability 'p' of error in voting must be a global average and is an oversimplification almost certainly. In case A above, the results of an election might be dominated by voters whose personal 'p' is large; although, again, it is not clear to me how one might show such a thing formally. -- DFB. On Wed, 25 Jul 2001, Sanford Lefkowitz wrote: In a certain process, there are millions of people voting for thousands of candidates. The top N will be declared winners. But the counting process is flawed and with probability 'p', a vote will be miscounted. (it might be counted for the wrong candidate or it might be counted for a non-existent candidate.) The latter would constitute a spoiled ballot, or not? What is the probability that the counted top N will correspond to the real top N? (there are actually two cases here: 1 where I want the order of the top N to be in the correct order and the other where I don't care if the order is correct) Thanks for any ideas, Sanford Lefkowitz Donald F. Burrill [EMAIL PROTECTED] 184 Nashua Road, Bedford, NH 03110 603-471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: vote counting
The case is very much like case B. A relatively small percent of candidates (maybe about 15%) will have a significant number of votes. A large number of candidates will have only 1 or 2 votes. It is the case that each voter gets only one vote. It is possible (but non trivial) to estimate the shape of the distribution of number of votes received. It probably is a major oversimplification to assume the probability 'p' of error is constant from all sources, but it would be highly impractical to assume otherwise. -Original Message- From: Donald Burrill [mailto:[EMAIL PROTECTED]] Sent: Wednesday, July 25, 2001 11:33 AM To: Sanford Lefkowitz Cc: [EMAIL PROTECTED] Subject: Re: vote counting The answers to your questions depend heavily on structural information that you almost certainly don't have, else one would not bother to have arranged a voting process. But consider two very different cases: A. Voters are absolutely indifferent to candidates: that is, all the candidates are equally attractive, or equally preferred by the voters. Then the identity of the candidate with the most votes is purely random, and the probability that the counted top N will correspond to the real top N will be very low indeed (in part because there IS no real top N; but even in the sense that another vote taken tomorrow would be very unlikely to reproduce the same set of top N, let alone in the same order). B. Some candidates are strongly preferred to others (by the voters as a whole, that is, as a population), and exactly N such candidates are so preferred. About the rest the voters are indifferent, on the whole. In these circumstances, one would expect a large difference between the number of votes cast for the least of the N and the number of votes cast for the greatest of the remaining candidates, and the probability that the counted top N will correspond to the real top N would be rather high (depending in part on how large 'p' is). I do not see how to estimate such a probability in the absence of any information about the distribution of preferences. I've assumed that by counting votes you mean that each voter casts exactly one ballot for (at most?) one candidate. For other voting schemes (e.g., vote for K candidates, K .LE. N, and specify one's preferences among them by assigning each candidate a preference from 1 (most favored) to K (least favored)) it is imaginable that answers to your questions might not differ, but showing that to be the case (or not) is another matter entirely. It also occurs to me that a single probability 'p' of error in voting must be a global average and is an oversimplification almost certainly. In case A above, the results of an election might be dominated by voters whose personal 'p' is large; although, again, it is not clear to me how one might show such a thing formally. -- DFB. On Wed, 25 Jul 2001, Sanford Lefkowitz wrote: In a certain process, there are millions of people voting for thousands of candidates. The top N will be declared winners. But the counting process is flawed and with probability 'p', a vote will be miscounted. (it might be counted for the wrong candidate or it might be counted for a non-existent candidate.) The latter would constitute a spoiled ballot, or not? What is the probability that the counted top N will correspond to the real top N? (there are actually two cases here: 1 where I want the order of the top N to be in the correct order and the other where I don't care if the order is correct) Thanks for any ideas, Sanford Lefkowitz Donald F. Burrill [EMAIL PROTECTED] 184 Nashua Road, Bedford, NH 03110 603-471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: output
perhaps we need for software to have 2 overall options ... show me all the output or, in the case of some interaction plots ... find a graphing method ... using different symbols ... that represent ON the graph ... pairs that are different from others (ie, any pair of DARK dots means different ... whereas a DARK dot and a LIGHT dot ... mean no) ... if we have adopted some pre set alpha ... or, a little table FIRST in the output ... that simply lists the combinations ... and says next to them ... YES ... NO ... without all the other peripherals included ... _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: variance estimation and cross-validation
From: Mark Everingham ([EMAIL PROTECTED]) Subject: variance estimation and cross-validation Newsgroups: sci.stat.math, sci.stat.edu, sci.stat.consult, sci.math Date: 2001-07-24 03:14:05 PST I'm not familiar with your research area, but if I understand your data correctly you have N image measurements per k folds. So that N=8 and and k=10 in your example below. I have a set of N images. I train a classifier to label pixels in an image as one of a set of classes. To estimate the accuracy of the classifier I use cross-validation with k folds, training on k-1 and testing on 1. Thus the estimated accuracy on an image is mu = mean(mean[i], i=1..k) where mean[i] is the mean accuracy across the images in fold i In the line above you define mu as the mean accuracy over images and folds, but above the formula for mu, you say that it is the accuracy on an image. I also want to know how much the accuracy varies from one image to another. I can think of two ways of estimating this: So, each observation is a measurement of accuracy? As you focus on variances below, you seem to be interested in the variability of the measurements, perhaps primarily within folds? Depending upon your design and sampling, a variance components analysis may be appropriate to answer your question. For example, if the within and between fold errors were independent, you could compare their variances. (a) sigma^2 = mean(var[i], i=1..k) where var[i] is the variance of the accuracy across the images in fold i One disadvantage with this approach is that you assume that the covariance between measurements on the same fold is zero (it doesn't look to be enormous in your data below, but it could still be important). Another is that, in your example, each var[i] is based only upon 8 measurements. or (b) sigma^2 = var(mean[i], i=1..k) * n where n is the number of images in each of the folds. With n=1 in the above formula (how do you interpret it otherwise?), you are ignoring the within-fold variance, which you might do if you have reason to believe that it is small. Though, your example data below seems to suggest otherwise. The variance for fold 3 looks unusually small. It might be a good idea to first estimate the between and within-fold variance components, and possibly others, depending upon the details of your study design, how much data you really have, and which assumptions you are willing to make. Different approaches could be used to look at accuracy in other ways. An example: fold mean var 1 91.43 36.2404 2 89.05 58.3696 3 97.39 3.3856 4 89.38 78.1456 5 91.09 104.858 6 88.49 87.4225 7 86.59 148.596 8 90.36 97.8121 9 86.05 77.6161 10 88.98 125.44 n = 8 (fold size) mu = 89.881 sigma^2 by (a) = 81.7886 (sigma = 9.0437) simga^2 by (b) = 71.7367 (sigma = 8.4698) Which estimate is better, or are both incorrect? I appreciate that the fold size (8) and number of folds (10) are small. Is there a better way? Is there any way to establish a confidence interval on the estimate? Exact confidence intervals can be easily calculated for many variance components when the data are balanced, though for unbalanced data, you'll usually have to settle with approximate methods. See the texts below for some details on OLS approaches. I don't know a good reference on ML methods for computing CIs on variance components. Snedecor GW, Cochran WG. Statistical methods (8th ed.) Iowa: Iowa, 1989. Burdick RK, Graybill FA. Confidence intervals on variance components. Marcel Dekker, NY, 1992. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: likert scale items
The following is extracted from one of my webpage. Hope it can help: -- The issue regarding the appropriateness of ordinal-scaled data in parametric tests was unsettled even in the eyes of Stevens (1951), the inventor of the four levels of measurement: As a matter of fact, most of the scales used widely and effectively by psychologists are ordinal scales ¡K there can be involved a kind of pragmatic sanction: in numerous instances it leads to fruitful results. (p.26) Based on the central limit theorem and Monte Carlo simulations, Baker, Hardyck, and Petrinovich (1966) and Borgatta and Bohrnstedt (1980) argued that for typical data, worrying about whether scales are ordinal or interval doesn't matter. Another argument against not using interval-based statistical techniques for ordinal data was suggested by Tukey (1986). In Tukey's view, this was a historically unfounded overreaction. In physics before precise measurements were introduced, many physical measurements were only approximately interval scales. For example, temperature measurement was based on liquid-in-glass thermometers. But it is unreasonable not to use a t-test to compare two groups of such temperatures. Tukey argued that researchers painted themselves into a corner on such matters because we were too obsessed with sanctification by precision and certainty. If our p-values or confidence intervals are to be sacred, they must be exact. In the practical world, when data values are transformed (e.g. transforming y to sqrt(y), or logy), the p values resulted from different expressions of data would change. Thus, ordinal-scaled data should not be banned from entering the realm of parametric tests. For a review of the debate concerning ordinal- and interval- scaled data, please consult Velleman and Wilkinson (1993). from: http://seamonkey.ed.asu.edu/~alex/teaching/WBI/parametric_test.html Chong-ho (Alex) Yu, Ph.D., MCSE, CNE Academic Research Professional/Manager Educational Data Communication, Assessment, Research and Evaluation Farmer 418 Arizona State University Tempe AZ 85287-0611 Email: [EMAIL PROTECTED] URL:http://seamonkey.ed.asu.edu/~alex/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: SRSes
Dennis Roberts wrote: snip but, we KNOW that most samples are drawn in a way that is WORSE than SRS ... thus, essentially every CI ... is too narrow ... or, every test statistic ... t or F or whatever ... has a p value that is too LOW ... what adjustment do we make for this basic problem? The adjustment for design is done with weights to get the point estimates using regular software such as SPSS etc. To get the confidence estimates special software such as WESVAR, SUDAAN, or CPLEX is commonly used. Because the latter packages are not as user friendly in their presentation of results, I usually get the point estimates in SPSS, then I use WESVAR or SUDAAN and get both point and interval estimates. I use the point estimates from the latter packages as navigational aids to find the interval estimates in the output and to assure that I am getting the right computations. Some sampling designs include cluster sampling (random effects), some stratification (fixed effects), and some both. For those with stratification only, if there is any difference in the means (proportions) among the strata, usually the CIs will be too wide. For those with cluster sampling, usually the CIs will be too narrow. For those designs with both stratification and clustering, the CIs will be subject to both narrowing and widening, and only specilized software will tell the net effect. In addition, ratio, regression, or difference estimates may have narrower true CIs. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: likert scale items
On Wed, 25 Jul 2001 07:26:19 -0400, Teen Assessment Project [EMAIL PROTECTED] wrote: I am using a measure with likert scale items. Original psychometrics for the measure included factor analysis to reduce the 100 variables to 20 composites. However, since the variables are not interval, This question does recur. I think there was some bad teaching in psychology departments, many years ago. but I don't think there is a textbook published in the last 20 years that doesn't regard a decent Likert scale as one of the better examples of Interval scaling. (Of course, there has been some misunderstanding, too, of what interval is all about. Similarly, I think error is more likely to exist in class notes, than within any of the current texts.) By design, a Likert total score is intended to be interval. You should keep scaling and scoring in mind for any criterion that you have. You might check-up on your individual Likert items, if you devised them yourself, to be sure that you didn't mess up your labels or your scoring. But most people don't worry about their Likert-type scores at all; treating them as interval is the standard. Your concern is not *totally* misplaced - especially in regard to separate items - but *almost*. A Likert item scored as Interval, as it happens, is more robust than a Likert item that is re-expressed merely as ranks -- given the deficiencies in the dealing with *ties* in the usual 'nonparametric' tests. (Logistic or normal modeling of ties is much better; but those are both rare.) shouldn't non-parametic tests be done to determine group differences (by gender, age, income) on the variables? Can I still use the composites...was it appropriate to do the original factor analysis on ordinal data? You can find other (old) comments in my stats-FAQ, or use groups.google.com to search the sci.stat.* groups. I know some other people have articulated the same conclusions. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: vote counting
On Wed, 25 Jul 2001 09:33:41 -0400, Sanford Lefkowitz [EMAIL PROTECTED] wrote: In a certain process, there are millions of people voting for thousands of candidates. The top N will be declared winners. But the counting process is flawed and with probability 'p', a vote will be miscounted. (it might be counted for the wrong candidate or it might be counted for a non-existent candidate.) For clarification: I assume you are talking about votes and winners in the thoroughly abstract, hypothetical, imaginary instance - where (for example) votes are miscounted TOTALLY AT RANDOM, and not because of ballot-flaws relating to position on a ballot; etc. What is the probability that the counted top N will correspond to the real top N? (there are actually two cases here: 1 where I want the order of the top N to be in the correct order and the other where I don't care if the order is correct) We could say, you are referring to errors in counting that are entirely uncorrelated with each other, or with anything. I can offer: You will need to parameterize the cases according to the spread in vote. And use some model. Is this, 20% of the candidates get 80% of the vote? or 10% get 90%? or 1%, 99%? (There is some name for those curves - Pareto?) AND - what is your question, to be quantified? I can be sure without doing anything, that for large N and smooth distributions, the top N counts will not fall in perfect order. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: SRSes
my previous remarks were about other sampling designs. I was comaring valid complex designs to SRS design and not non-sampling case selection. dennis roberts wrote: my hypothesis of course is that more often than not ... in data collection problems where sampling is involved AND inferences are desired ... we goof far more often ... than do a better than SRS job of sampling 1. i wonder if anyone has really taken a SRS of the literature ... maybe stratified by journals or disciplines ... and tried to see to what extent sampling in the investigations was done via SRS ... better than that ... or worse than that??? of course, i would expect even if this is done ... we would have a + biased figure ... since, the notion is that only the better/best of the submitted stuff gets published so, the figures for all stuff that is done (ie, the day in day out batch), published or not ... would have to look worse off ... 2. can worse than SRS ... be as MUCH worse ... as complex sampling plans can be better than SRS??? that is ... could a standard error for a bad sampling plan (if we could even estimate it) ... be proportionately as much LARGER than the standard error for SRS samples ... as complex sampling plans can produce standard errors that are as proportionately SMALLER than SRS samples? are there ANY data that exist on this matter? == dennis roberts, penn state university educational psychology, 8148632401 http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Nonrandomness of binary matrices
Thanks to Rich Ulrich for the suggestion below -- that was the direction I was heading, but there seem to be difficulties. The general problem is that I have a standard [nxp] data matrix, but (skipping over the scientific details) some of the values are special, typically 5-20% of them, and I want to know whether their distribution within the matrix is structured in some way. In particular, they might be concentrated in particular rows or columns, but beyond that I have no notion of nonrandom. I'm hoping that they're uniformly randomly distributed (or rather, not significantly different from random) because then I can basically ignore the fact that they're special, for the scientific problem at hand. I'd like to have two things: a nicely behaved index of nonrandomness (perhaps a test statistic, rescaled to an interval 0-1?) and a significance test. So I recoded the matrix as binary, with the special values coded as 1s. I presumed that the null marginal distributions would be binomial rather than Poisson because the frequency of occurrence is so high, but either way I could test that. And if I measured the deviations of marginal totals from expected (as a chi-square statistic, perhaps, or a mean squared deviation) that would provide both an index and a goodness-of-fit significance test for the entire matrix. But the problem is: what if the row totals and column totals are not independent? I've done a few 2-way chi-square contingency tests on these matrices (using randomized null distributions, of course, since the matrices are binary), and some of the results are statistically significant. Doesn't this mean that I can't simply accumulate the row and column totals for a goodness-of-fit test, since they're not always independent? And even if I did the goodness-of-fit tests for rows and columns independently, how do I combine the p-values to get a single level of singificance for the entire matrix, if the tests are not independent? I have the feeling that I'm missing something obvious here but I can't quite get a handle on it, and this little problem is holding up the analysis of the results from a much larger study. I've talked to statisticians on campus, with little progress, so basically I'm begging for help. Rich Strauss At 10:47 AM 7/25/01 -0400, you wrote: On 23 Jul 2001 14:22:58 -0700, [EMAIL PROTECTED] (Rich Strauss) wrote: Say I have a binary data matrix for which both the rows (observations) and columns (variables) are computely permutable. (In practice, about 5-20% of the cells will contain 1's, and the remainder will contain 0's.) Assume that the expected probability of a cell containing a '1' is identical for all cells in the matrix. I'd like to be able to test this assumption by measuring (and testing the significance of) the degree of 'nonrandomness' of the 1's in the matrix. If the rows and columns were fixed in sequence, then this would be an easy problem involving spatial statistics, but the permutability seems to really complicate things. I think that I can test the rows or columns separately by comparing the row or column totals against a corresponding binomial distribution using a goodness-of-fit test, but I can't get a handle on how to do this for the entire matrix. I'd really appreciate ideas about this. Thanks in advance. I'm not sure that I grasp what you are after, but - an idea. If they are completely permutable, then permute: sort them by decreasing counts for row and for column. This puts me in mind of certain alternatives to random. The set of counts on a margin should be ... Poisson? The table can be drawn into quadrants or smaller sections, so that the number of 1s in each can be tabulated, to make ordinary contingency tables. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: likert scale items
for a good treatment of this issue ... levels of measurement and statistics to use ... though, it is not real simple ... see ftp://ftp.sas.com/pub/neural/measurement.html warren sarle of SAS wrote this and, it is excellent forget about scales and statistics for a moment ... what kinds of STATEMENTS do you want to be able to make ... about measurement variables ... THAT is the real issue ... and whether you should pay attention to levels of measurement and statistics ... At 09:18 AM 7/25/01 -0700, Alex Yu wrote: The following is extracted from one of my webpage. Hope it can help: = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: likert scale items
inherent problems related to LICKert items and level of measurement that create problems would be these too 1. how many response categories are there for AN item??? by the way ... likert used many types ... including YES ? NO at THIS level ... i think it a bit presumptuous to think that we are working with interval level measurement 2. what the labelling is FOR points ON an item ... i think it is easier to pretend the item level measurement is interval IF the scale is in terms of % agreement terms ... rather than SA ... SD kinds of response points 3. how MANY items there are ... now, if you have FEW items ... with FEW points ... that are like SA ? SD ... then at the item or total score level ... i think it is hard to assume interval level data ... if you have MANY items that each have NUMEROUS scale points ... that are framed differently that SA to SD ... then assuming interval level data is much more tenable ... _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: likert scale items
If your items are visually anchored so as to imply equal spacing, like: +++++ 01234 leastmost possiblepossible then one might accept the data as interval-level, on the assumption that respondents interpret them as such. Also keep in mind that after you add responses on several items, minor deviations of the response categories from being equally-spaced may matter less. In my substance abuse and personality research with teens, I have done a lot of factor analysis on ordered-category response items. One way to avoid the assumption of equally-spaced categories (though introducing an assumption of normally distributed traits) is to perform factor analysis of polychoric correlation coefficients. For more information on polychoric correlations and their factor analysis, see: http://ourworld.compuserve.com/homepages/jsuebersax/tetra.htm http://ourworld.compuserve.com/homepages/jsuebersax/irt.htm With my data, factor analysis produced mostly the same results regardless of whether polychoric correlations or regular Pearson correlations were used. If you are concerned about creating scales by summing ordered-category responses, there is the alternative of latent trait modeling. See: http://ourworld.compuserve.com/homepages/jsuebersax/lta.htm and some of the links there. Again, one often finds it makes little or no practical difference. Scale scores produced by simply adding item responses and scores produced by more complex latent trait models may correlate .99 or better with each other. BTW, the original study you describe sounds so much like one I did the analysis for that I wonder if they are the same. You aren't by any chance referring to a study done in Winston-Salem, North Carolina, are you? John Uebersax Teen Assessment Project [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... I am using a measure with likert scale items. Original psychometrics for the measure included factor analysis to reduce the 100 variables to 20 composites. However, since the variables are not interval, shouldn't non-parametic tests be done to determine group differences (by gender, age, income) on the variables? Can I still use the composites...was it appropriate to do the original factor analysis on ordinal data? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: likert scale items
here are a few videos of likert ... http://ollie.dcccd.edu/mgmt1374/book_contents/3organizing/org_process/Likert.htm _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Reach and Frequency
An advertiser purchases online Ad impressions and wants to achieve a certain reach over a specified period of time. How many impressions does he have to purchase? The page-per-user distribution is known. I've published a solution on my web site, at datashaping.com/internet.shtml. However, I am wondering if more general results are available. Thanks. Vincent Granville, Ph.D. www.datashaping.com = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =