Re: What is an outlier ?

2002-02-26 Thread Jon Cryer

and with bivariate data, neither component need be high or low!

Jon Cryer

At 12:14 PM 2/25/2002 -0700, you wrote:
>Of course it can be. An outlier is any value that is not usual for your data
>set.
>"Voltolini" <[EMAIL PROTECTED]> wrote in message
>002f01c1be21$65913d60$0fe9e3c8@oemcomputer">news:002f01c1be21$65913d60$0fe9e3c8@oemcomputer...
> > Hi,
> >
> >
> > My doubt isan outlier can be a LOW data value in the sample (and not
> > just the highest) ?
> >
> > Several text boks dont make this clear !!!
> >
> >
> > Thanks
> >
> >
> > V.
> >
> >
> >
> > =
> > Instructions for joining and leaving this list, remarks about the
> > problem of INAPPROPRIATE MESSAGES, and archives are available at
> >   http://jse.stat.ncsu.edu/
> > =
>
>
>
>
>=
>Instructions for joining and leaving this list, remarks about the
>problem of INAPPROPRIATE MESSAGES, and archives are available at
>   http://jse.stat.ncsu.edu/
>=




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: What is an outlier ?

2002-02-25 Thread Glen Barnett

Voltolini wrote:
> 
> Hi,
> 
> My doubt isan outlier can be a LOW data value in the sample (and not
> just the highest) ?
> 
> Several text boks dont make this clear !!!

What makes an outlier "an outlier" is your model. If your model accounts
for all the observations, you can't really call any of them an outlier.
If your model adequately accounts for all but one or two unusual
observations, you might regard them as coming from some process other
than that which generated the data you model accounts for, and call them
outliers.

Such "not adequately accounted for" observations may be low
observations, or high
observations, or they may actually turn out be somewhere in the middle
of the range of your data - as I have seen with time series for example,
where in some applications an autoregressive models was a very good
desctiption of a long series, apart from a few outliers in the first
quarter or so of the time period (which did in the end turn out to have
come from a different process, because the protocol wasn't always being
properly followed early on). Two of those "outliers" - in the sense that
the model didn't adequately account for them - turn out to be neither
particularly high or low observations - but they were substantially
higher or lower than expected from the model. 

Another case where you might have "outliers" in the middle of your data
is in a regression context, where a generally increasing relationship
shows a tight, gaussian-looking random scatter about the relationship,
but with a couple of relatively low y-values at some of the higher
x-values. The observations themselves may actually be very close to the
mean of the y's, but the model of the relationship makes them "unusual".
A different model - for example, one where the observations come from a
distribution which has the same expectation as a function of x, but
which has a heavier tail to the left around that - might account for all
the data and not find any outliers.

Glen


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: What is an outlier ?

2002-02-25 Thread Dennis Roberts

of course, if one has control over the data, checking the coding and making 
sure it is correct is a good thing to do

if you do not have control over that, then there may be very little you can 
do with it and in fact, you may be totally UNaware of an outlier problem

i see as a potentially MUCH larger problem when ONLY certain summary 
statistics are shown without any basic tallies/graphs displayed so, IF 
there are some really strange outlier values, it usually will go undetected ...

correlations are ONE good case in point ... have a look at the following 
scatterplot ... height in inches and weight in pounds ... from the pulse 
data set in minitab


  -  *
  -
   300+
  -
  Weight  -
  - 2
  - 2  224 32
   150+   ** 3458*454322*
  -*53*3*535  2
  -  **
--+-+-+-+-+-+Height
   32.0  40.0  48.0  56.0  64.0  72.0

now, the actual r between the X and Y is -.075 ... and of course, this 
seems strange but, IF you had only seen this in a matrix of r values ... 
you might say that perhaps there was serious range restriction that more or 
less wiped out the r in this case ...  but even the desc. stats might not 
adequately tell you of this problem

IF you had the scatterplot, you probably would figure out REAL quick that 
there is a PROBLEM with one of the data points ...

in fact, without that one weird data point, the r is about .8 ... which 
makes a lot better sense when correlating heights and weights of college 
students


At 09:06 PM 2/25/02 +, Art Kendall wrote:

>--6F47CB3D3B10A10A3E9B064C
>Content-Type: text/plain; charset=us-ascii
>Content-Transfer-Encoding: 7bit
>
>An "outlier" is any value for a variable that is suspect given the
>measurement system, "common sense",  other values for the variable in
>the data set, or  the values a case has on other variables.
>=

Dennis Roberts, 208 Cedar Bldg., University Park PA 16802

WWW: http://roberts.ed.psu.edu/users/droberts/drober~1.htm
AC 8148632401



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: What is an outlier ? cont'd

2002-02-25 Thread Art Kendall


--A59A95727DA65C2AB2F9EBF5
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

That being said, occasions can arise where there are outliers other than
from measurement or data entry error. Different disciplines have different
approaches.
What discipline are you studying? What is the variable you are concerned
about?  How is it measured?

some examples of low values:
10 pounds would be a suspicious value for an adult's weight.
Few college students are under 16.
37degrees F would be unreasonable for a body temperature of a li


--A59A95727DA65C2AB2F9EBF5
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit



That being said, occasions can arise where there are outliers other
than from measurement or data entry error. Different disciplines have different
approaches.
What discipline are you studying? What is the variable you are concerned
about?  How is it measured?
some examples of low values:
10 pounds would be a suspicious value for an adult's weight.
Few college students are under 16.
37degrees F would be unreasonable for a body temperature of a li
 

--A59A95727DA65C2AB2F9EBF5--



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: What is an outlier ?

2002-02-25 Thread IPEK

Of course it can be. An outlier is any value that is not usual for your data
set.
"Voltolini" <[EMAIL PROTECTED]> wrote in message
002f01c1be21$65913d60$0fe9e3c8@oemcomputer">news:002f01c1be21$65913d60$0fe9e3c8@oemcomputer...
> Hi,
>
>
> My doubt isan outlier can be a LOW data value in the sample (and not
> just the highest) ?
>
> Several text boks dont make this clear !!!
>
>
> Thanks
>
>
> V.
>
>
>
> =
> Instructions for joining and leaving this list, remarks about the
> problem of INAPPROPRIATE MESSAGES, and archives are available at
>   http://jse.stat.ncsu.edu/
> =




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=