Dear Isobel,

Thanks for your quick and helpful reply!

(1) I would like to trust both the accuracy and precision of the dataset,
and the real problem is how we "play the computer game". The extreme values
may be from the samples which by chance contains many minerals.

(2) From the information of percentiles I provided in the message, you can
find that
the dataset is heavily skewed in deed. Logarithmic transformation can make
some of the variables follow the "normal distribution", but not all.
However, the extreme values still look extreme in the transformed dataset.

(3) There may be two populations: "background" and "mineralised". However,
there is really no way to "dichotomise" the two populations. Geographically
or mathematically? Geographically, there are three areas of high values.
Mathematically, we need some proof. Even though we could properly separate
the datasets into two "populations", the extreme values may still be extreme
in the "mineralised" population.

Since the really "bad" values are only <2% of the total number (such as 4 or
5 values out of the total number of 223, which can also be seen from the
percentiles), I am unwilling to use nonparametric methods until we cannot
find a way to use the parametric methods.

Another problem is when we carry out spatial interpolation, these values may
produce artificial contour lines around these sampling locations, even
though they can be smoothed. I don't think this is the realistic situation
in the field.

Well, I am still not very confident what the best way should be ... I know
the worst way is to discard these "outlying" values, and the second worst
way is to use non-parametric methods.

Cheers,

Chaosheng Zhang


----- Original Message -----
From: "Isobel Clark" <[EMAIL PROTECTED]>
To: "Chaosheng Zhang" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, December 13, 2001 2:18 PM
Subject: Re: AI-GEOSTATS: Extreme values?


> > My question is: How to deal with the
> > extreme/outlying values in a data set?
> The real priority is to establish why you have extreme
> highs. For example:
>
> (1) is there a high imprecision in measuring the
> values, so that the sample observations are actually
> inaccurate? If so, is it relative to the value or a
> flat error?
>
> (2) do you have a skewed distribution of values?
>
> (3) do you have two (or more) populations, only one of
> which gives the high values?
>
> and there may be others. Once you determine the reason
> for extreme values, then you can more objectively know
> how to deal with them.
>
> For example, if you think (2) is most likely than look
> at transformations or distribution-free approaches to
> geostatistics. You can find some of my papers in
> dealing with positivel skewed distributions at:
>
> http://uk.geocities.com/drisobelclark/resume/Publications.html
>
> If (3) is more likely - as may be probable is your are
> looking at an area where samples may be 'background'
> or 'contaminated' - you really need to identify the
> populations first. Then you may be able to apply a
> mixture model together with indicator geostatistical
> approaches.
>
> If (1) is your problem, then you may be able to use a
> rough non-parametric approach to get to cross
> validation. The 'error statistics' in a cross
> validation exercise will often assist in identifying
> erroneous sample measurements.
>
> Hope this helps
> Isobel Clark
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com


--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and "unsubscribe 
ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org

Reply via email to