Re: What is an outlier ?
and with bivariate data, neither component need be high or low! Jon Cryer At 12:14 PM 2/25/2002 -0700, you wrote: >Of course it can be. An outlier is any value that is not usual for your data >set. >"Voltolini" <[EMAIL PROTECTED]> wrote in message >002f01c1be21$65913d60$0fe9e3c8@oemcomputer">news:002f01c1be21$65913d60$0fe9e3c8@oemcomputer... > > Hi, > > > > > > My doubt isan outlier can be a LOW data value in the sample (and not > > just the highest) ? > > > > Several text boks dont make this clear !!! > > > > > > Thanks > > > > > > V. > > > > > > > > = > > Instructions for joining and leaving this list, remarks about the > > problem of INAPPROPRIATE MESSAGES, and archives are available at > > http://jse.stat.ncsu.edu/ > > = > > > > >= >Instructions for joining and leaving this list, remarks about the >problem of INAPPROPRIATE MESSAGES, and archives are available at > http://jse.stat.ncsu.edu/ >= = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: What is an outlier ?
Voltolini wrote: > > Hi, > > My doubt isan outlier can be a LOW data value in the sample (and not > just the highest) ? > > Several text boks dont make this clear !!! What makes an outlier "an outlier" is your model. If your model accounts for all the observations, you can't really call any of them an outlier. If your model adequately accounts for all but one or two unusual observations, you might regard them as coming from some process other than that which generated the data you model accounts for, and call them outliers. Such "not adequately accounted for" observations may be low observations, or high observations, or they may actually turn out be somewhere in the middle of the range of your data - as I have seen with time series for example, where in some applications an autoregressive models was a very good desctiption of a long series, apart from a few outliers in the first quarter or so of the time period (which did in the end turn out to have come from a different process, because the protocol wasn't always being properly followed early on). Two of those "outliers" - in the sense that the model didn't adequately account for them - turn out to be neither particularly high or low observations - but they were substantially higher or lower than expected from the model. Another case where you might have "outliers" in the middle of your data is in a regression context, where a generally increasing relationship shows a tight, gaussian-looking random scatter about the relationship, but with a couple of relatively low y-values at some of the higher x-values. The observations themselves may actually be very close to the mean of the y's, but the model of the relationship makes them "unusual". A different model - for example, one where the observations come from a distribution which has the same expectation as a function of x, but which has a heavier tail to the left around that - might account for all the data and not find any outliers. Glen = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: What is an outlier ?
of course, if one has control over the data, checking the coding and making sure it is correct is a good thing to do if you do not have control over that, then there may be very little you can do with it and in fact, you may be totally UNaware of an outlier problem i see as a potentially MUCH larger problem when ONLY certain summary statistics are shown without any basic tallies/graphs displayed so, IF there are some really strange outlier values, it usually will go undetected ... correlations are ONE good case in point ... have a look at the following scatterplot ... height in inches and weight in pounds ... from the pulse data set in minitab - * - 300+ - Weight - - 2 - 2 224 32 150+ ** 3458*454322* -*53*3*535 2 - ** --+-+-+-+-+-+Height 32.0 40.0 48.0 56.0 64.0 72.0 now, the actual r between the X and Y is -.075 ... and of course, this seems strange but, IF you had only seen this in a matrix of r values ... you might say that perhaps there was serious range restriction that more or less wiped out the r in this case ... but even the desc. stats might not adequately tell you of this problem IF you had the scatterplot, you probably would figure out REAL quick that there is a PROBLEM with one of the data points ... in fact, without that one weird data point, the r is about .8 ... which makes a lot better sense when correlating heights and weights of college students At 09:06 PM 2/25/02 +, Art Kendall wrote: >--6F47CB3D3B10A10A3E9B064C >Content-Type: text/plain; charset=us-ascii >Content-Transfer-Encoding: 7bit > >An "outlier" is any value for a variable that is suspect given the >measurement system, "common sense", other values for the variable in >the data set, or the values a case has on other variables. >= Dennis Roberts, 208 Cedar Bldg., University Park PA 16802 WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm AC 8148632401 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: What is an outlier ? cont'd
--A59A95727DA65C2AB2F9EBF5 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit That being said, occasions can arise where there are outliers other than from measurement or data entry error. Different disciplines have different approaches. What discipline are you studying? What is the variable you are concerned about? How is it measured? some examples of low values: 10 pounds would be a suspicious value for an adult's weight. Few college students are under 16. 37degrees F would be unreasonable for a body temperature of a li --A59A95727DA65C2AB2F9EBF5 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit That being said, occasions can arise where there are outliers other than from measurement or data entry error. Different disciplines have different approaches. What discipline are you studying? What is the variable you are concerned about? How is it measured? some examples of low values: 10 pounds would be a suspicious value for an adult's weight. Few college students are under 16. 37degrees F would be unreasonable for a body temperature of a li --A59A95727DA65C2AB2F9EBF5-- = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: What is an outlier ?
Of course it can be. An outlier is any value that is not usual for your data set. "Voltolini" <[EMAIL PROTECTED]> wrote in message 002f01c1be21$65913d60$0fe9e3c8@oemcomputer">news:002f01c1be21$65913d60$0fe9e3c8@oemcomputer... > Hi, > > > My doubt isan outlier can be a LOW data value in the sample (and not > just the highest) ? > > Several text boks dont make this clear !!! > > > Thanks > > > V. > > > > = > Instructions for joining and leaving this list, remarks about the > problem of INAPPROPRIATE MESSAGES, and archives are available at > http://jse.stat.ncsu.edu/ > = = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =