Dear Isobel, Thanks for your quick and helpful reply!
(1) I would like to trust both the accuracy and precision of the dataset, and the real problem is how we "play the computer game". The extreme values may be from the samples which by chance contains many minerals. (2) From the information of percentiles I provided in the message, you can find that the dataset is heavily skewed in deed. Logarithmic transformation can make some of the variables follow the "normal distribution", but not all. However, the extreme values still look extreme in the transformed dataset. (3) There may be two populations: "background" and "mineralised". However, there is really no way to "dichotomise" the two populations. Geographically or mathematically? Geographically, there are three areas of high values. Mathematically, we need some proof. Even though we could properly separate the datasets into two "populations", the extreme values may still be extreme in the "mineralised" population. Since the really "bad" values are only <2% of the total number (such as 4 or 5 values out of the total number of 223, which can also be seen from the percentiles), I am unwilling to use nonparametric methods until we cannot find a way to use the parametric methods. Another problem is when we carry out spatial interpolation, these values may produce artificial contour lines around these sampling locations, even though they can be smoothed. I don't think this is the realistic situation in the field. Well, I am still not very confident what the best way should be ... I know the worst way is to discard these "outlying" values, and the second worst way is to use non-parametric methods. Cheers, Chaosheng Zhang ----- Original Message ----- From: "Isobel Clark" <[EMAIL PROTECTED]> To: "Chaosheng Zhang" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Thursday, December 13, 2001 2:18 PM Subject: Re: AI-GEOSTATS: Extreme values? > > My question is: How to deal with the > > extreme/outlying values in a data set? > The real priority is to establish why you have extreme > highs. For example: > > (1) is there a high imprecision in measuring the > values, so that the sample observations are actually > inaccurate? If so, is it relative to the value or a > flat error? > > (2) do you have a skewed distribution of values? > > (3) do you have two (or more) populations, only one of > which gives the high values? > > and there may be others. Once you determine the reason > for extreme values, then you can more objectively know > how to deal with them. > > For example, if you think (2) is most likely than look > at transformations or distribution-free approaches to > geostatistics. You can find some of my papers in > dealing with positivel skewed distributions at: > > http://uk.geocities.com/drisobelclark/resume/Publications.html > > If (3) is more likely - as may be probable is your are > looking at an area where samples may be 'background' > or 'contaminated' - you really need to identify the > populations first. Then you may be able to apply a > mixture model together with indicator geostatistical > approaches. > > If (1) is your problem, then you may be able to use a > rough non-parametric approach to get to cross > validation. The 'error statistics' in a cross > validation exercise will often assist in identifying > erroneous sample measurements. > > Hope this helps > Isobel Clark > > > > > __________________________________________________ > Do You Yahoo!? > Everything you'll ever need on one web page > from News and Sport to Email and Music Charts > http://uk.my.yahoo.com -- * To post a message to the list, send it to [EMAIL PROTECTED] * As a general service to the users, please remember to post a summary of any useful responses to your questions. * To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list * Support to the list is provided at http://www.ai-geostats.org