Re: AI-GEOSTATS: Extreme values?

2001-12-13 Thread Isobel Clark

 My question is: How to deal with the
 extreme/outlying values in a data set?
The real priority is to establish why you have extreme
highs. For example:

(1) is there a high imprecision in measuring the
values, so that the sample observations are actually
inaccurate? If so, is it relative to the value or a
flat error?

(2) do you have a skewed distribution of values?

(3) do you have two (or more) populations, only one of
which gives the high values?

and there may be others. Once you determine the reason
for extreme values, then you can more objectively know
how to deal with them. 

For example, if you think (2) is most likely than look
at transformations or distribution-free approaches to
geostatistics. You can find some of my papers in
dealing with positivel skewed distributions at:

http://uk.geocities.com/drisobelclark/resume/Publications.html

If (3) is more likely - as may be probable is your are
looking at an area where samples may be 'background'
or 'contaminated' - you really need to identify the
populations first. Then you may be able to apply a
mixture model together with indicator geostatistical
approaches.

If (1) is your problem, then you may be able to use a
rough non-parametric approach to get to cross
validation. The 'error statistics' in a cross
validation exercise will often assist in identifying
erroneous sample measurements.

Hope this helps
Isobel Clark




__
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org



Re: AI-GEOSTATS: Extreme values?

2001-12-13 Thread Chaosheng Zhang

Dear Isobel,

Thanks for your quick and helpful reply!

(1) I would like to trust both the accuracy and precision of the dataset,
and the real problem is how we play the computer game. The extreme values
may be from the samples which by chance contains many minerals.

(2) From the information of percentiles I provided in the message, you can
find that
the dataset is heavily skewed in deed. Logarithmic transformation can make
some of the variables follow the normal distribution, but not all.
However, the extreme values still look extreme in the transformed dataset.

(3) There may be two populations: background and mineralised. However,
there is really no way to dichotomise the two populations. Geographically
or mathematically? Geographically, there are three areas of high values.
Mathematically, we need some proof. Even though we could properly separate
the datasets into two populations, the extreme values may still be extreme
in the mineralised population.

Since the really bad values are only 2% of the total number (such as 4 or
5 values out of the total number of 223, which can also be seen from the
percentiles), I am unwilling to use nonparametric methods until we cannot
find a way to use the parametric methods.

Another problem is when we carry out spatial interpolation, these values may
produce artificial contour lines around these sampling locations, even
though they can be smoothed. I don't think this is the realistic situation
in the field.

Well, I am still not very confident what the best way should be ... I know
the worst way is to discard these outlying values, and the second worst
way is to use non-parametric methods.

Cheers,

Chaosheng Zhang


- Original Message -
From: Isobel Clark [EMAIL PROTECTED]
To: Chaosheng Zhang [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Thursday, December 13, 2001 2:18 PM
Subject: Re: AI-GEOSTATS: Extreme values?


  My question is: How to deal with the
  extreme/outlying values in a data set?
 The real priority is to establish why you have extreme
 highs. For example:

 (1) is there a high imprecision in measuring the
 values, so that the sample observations are actually
 inaccurate? If so, is it relative to the value or a
 flat error?

 (2) do you have a skewed distribution of values?

 (3) do you have two (or more) populations, only one of
 which gives the high values?

 and there may be others. Once you determine the reason
 for extreme values, then you can more objectively know
 how to deal with them.

 For example, if you think (2) is most likely than look
 at transformations or distribution-free approaches to
 geostatistics. You can find some of my papers in
 dealing with positivel skewed distributions at:

 http://uk.geocities.com/drisobelclark/resume/Publications.html

 If (3) is more likely - as may be probable is your are
 looking at an area where samples may be 'background'
 or 'contaminated' - you really need to identify the
 populations first. Then you may be able to apply a
 mixture model together with indicator geostatistical
 approaches.

 If (1) is your problem, then you may be able to use a
 rough non-parametric approach to get to cross
 validation. The 'error statistics' in a cross
 validation exercise will often assist in identifying
 erroneous sample measurements.

 Hope this helps
 Isobel Clark




 __
 Do You Yahoo!?
 Everything you'll ever need on one web page
 from News and Sport to Email and Music Charts
 http://uk.my.yahoo.com


--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org