Re: AI-GEOSTATS: Extreme values?
Dear Chaosheng Zang The sampling interval is so wide that the high values could easily be related to "hot spots" of higher grade contamination, i..e dumping areas for particular kinds of slags, mineralized waste, etc. A property map might help. Have you contoured the data? If so, the sampling interval is so wide that real hot spots of environmental significance might not show 2D distribution on such a wide sampling grid, however. Regards Marcel Vallée, Eng,, Geo. Geoconseil Marcel Vallée Inc. 706 Routhier Ave Québec, Québec G1X 3J9 Canada Tel:(1) 418 652 3497 Fax:(1) 418 652 9148 Email: [EMAIL PROTECTED] == 13/12/01 08:01:48, Chaosheng Zhang <[EMAIL PROTECTED]> wrote: > > Date: Thu, 13 Dec 2001 13:01:48 + > > From: Chaosheng Zhang <[EMAIL PROTECTED]> > Subject:AI-GEOSTATS: Extreme values? > To: [EMAIL PROTECTED] > > > > Dear all, > > My question is: How to deal with the extreme/outlying values in a data set? > > I am dealing with heavy metal concentrations in soils from a mine area. The > > sample number is 223, and the samples are spatially evenly distributed with > the sampling interval of 400 metres. There are several samples with > extremely high values, which makes me feel uncomfortable. The percentiles of > the dataset are listed as follows (in mg/kg): > > > ZnCu Pb CdAs > Min 4 1 250.0 2 > 5%35 6 350.1 6 > 10%40 7 410.2 7 > > 25%6513 620.3 9 > 50% 122181680.615 > 75% 338278211.528 > 90% 90756 27992.858 > > 95% 1986 116 44904.280 > 96% 2462 151 46984.982 > 97% 3493 178 54136.291 > 98% 4697 207 76098.3 111 > > 99% 6712 247 11750 12.4 184 > Max 11473 1293 16305 48.5 1060 > When doing geostatistical and statistical analyses, we need some confidence > in dealing with the these very high extreme values which account for less > > than 2% of the total sample number. > > Any suggestions? > > Cheers, > > Chaosheng Zhang > === > Dr. Chaosheng Zhang > Department of Geography > National University of Ireland > Galway > IRELAND > > Tel: +353-91-524411 ext. 2375 > Fax: +353-91-525700 > Email: [EMAIL PROTECTED] > === -- * To post a message to the list, send it to [EMAIL PROTECTED] * As a general service to the users, please remember to post a summary of any useful responses to your questions. * To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list * Support to the list is provided at http://www.ai-geostats.org
Re: AI-GEOSTATS: interannual spatial "stability" of variable
Chris There are two ways of approaching data which has a time element: (1) treat time as a co-variable and use co-kriging. You would probably want to do this if you have more than one variable anyway (2) treat time as a dimension -- as an additional co-ordinate. If your original data is two-dimensional, you can use any normal 3d geostat package for this. If your original data is already 3d, things get a little more complicated. Good place to start would be Geostat Congress Volumes or Noel Cressie's book. Isobel Clark __ Do You Yahoo!? Everything you'll ever need on one web page from News and Sport to Email and Music Charts http://uk.my.yahoo.com -- * To post a message to the list, send it to [EMAIL PROTECTED] * As a general service to the users, please remember to post a summary of any useful responses to your questions. * To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list * Support to the list is provided at http://www.ai-geostats.org
AI-GEOSTATS: interannual spatial "stability" of variable
Hi, we have multiyear point data (gridded) of a measured variable, Y, from farm fields and there are spatial patterns to Y. We want to measure the degree of similarity of these patterns from year to year. Could someone please point me in a direction or provide references? thanks, chris -- * To post a message to the list, send it to [EMAIL PROTECTED] * As a general service to the users, please remember to post a summary of any useful responses to your questions. * To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list * Support to the list is provided at http://www.ai-geostats.org
Re: AI-GEOSTATS: Extreme values?
Dear Isobel, Thanks for your quick and helpful reply! (1) I would like to trust both the accuracy and precision of the dataset, and the real problem is how we "play the computer game". The extreme values may be from the samples which by chance contains many minerals. (2) From the information of percentiles I provided in the message, you can find that the dataset is heavily skewed in deed. Logarithmic transformation can make some of the variables follow the "normal distribution", but not all. However, the extreme values still look extreme in the transformed dataset. (3) There may be two populations: "background" and "mineralised". However, there is really no way to "dichotomise" the two populations. Geographically or mathematically? Geographically, there are three areas of high values. Mathematically, we need some proof. Even though we could properly separate the datasets into two "populations", the extreme values may still be extreme in the "mineralised" population. Since the really "bad" values are only <2% of the total number (such as 4 or 5 values out of the total number of 223, which can also be seen from the percentiles), I am unwilling to use nonparametric methods until we cannot find a way to use the parametric methods. Another problem is when we carry out spatial interpolation, these values may produce artificial contour lines around these sampling locations, even though they can be smoothed. I don't think this is the realistic situation in the field. Well, I am still not very confident what the best way should be ... I know the worst way is to discard these "outlying" values, and the second worst way is to use non-parametric methods. Cheers, Chaosheng Zhang - Original Message - From: "Isobel Clark" <[EMAIL PROTECTED]> To: "Chaosheng Zhang" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Thursday, December 13, 2001 2:18 PM Subject: Re: AI-GEOSTATS: Extreme values? > > My question is: How to deal with the > > extreme/outlying values in a data set? > The real priority is to establish why you have extreme > highs. For example: > > (1) is there a high imprecision in measuring the > values, so that the sample observations are actually > inaccurate? If so, is it relative to the value or a > flat error? > > (2) do you have a skewed distribution of values? > > (3) do you have two (or more) populations, only one of > which gives the high values? > > and there may be others. Once you determine the reason > for extreme values, then you can more objectively know > how to deal with them. > > For example, if you think (2) is most likely than look > at transformations or distribution-free approaches to > geostatistics. You can find some of my papers in > dealing with positivel skewed distributions at: > > http://uk.geocities.com/drisobelclark/resume/Publications.html > > If (3) is more likely - as may be probable is your are > looking at an area where samples may be 'background' > or 'contaminated' - you really need to identify the > populations first. Then you may be able to apply a > mixture model together with indicator geostatistical > approaches. > > If (1) is your problem, then you may be able to use a > rough non-parametric approach to get to cross > validation. The 'error statistics' in a cross > validation exercise will often assist in identifying > erroneous sample measurements. > > Hope this helps > Isobel Clark > > > > > __ > Do You Yahoo!? > Everything you'll ever need on one web page > from News and Sport to Email and Music Charts > http://uk.my.yahoo.com -- * To post a message to the list, send it to [EMAIL PROTECTED] * As a general service to the users, please remember to post a summary of any useful responses to your questions. * To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list * Support to the list is provided at http://www.ai-geostats.org
Re: AI-GEOSTATS: Extreme values?
> My question is: How to deal with the > extreme/outlying values in a data set? The real priority is to establish why you have extreme highs. For example: (1) is there a high imprecision in measuring the values, so that the sample observations are actually inaccurate? If so, is it relative to the value or a flat error? (2) do you have a skewed distribution of values? (3) do you have two (or more) populations, only one of which gives the high values? and there may be others. Once you determine the reason for extreme values, then you can more objectively know how to deal with them. For example, if you think (2) is most likely than look at transformations or distribution-free approaches to geostatistics. You can find some of my papers in dealing with positivel skewed distributions at: http://uk.geocities.com/drisobelclark/resume/Publications.html If (3) is more likely - as may be probable is your are looking at an area where samples may be 'background' or 'contaminated' - you really need to identify the populations first. Then you may be able to apply a mixture model together with indicator geostatistical approaches. If (1) is your problem, then you may be able to use a rough non-parametric approach to get to cross validation. The 'error statistics' in a cross validation exercise will often assist in identifying erroneous sample measurements. Hope this helps Isobel Clark __ Do You Yahoo!? Everything you'll ever need on one web page from News and Sport to Email and Music Charts http://uk.my.yahoo.com -- * To post a message to the list, send it to [EMAIL PROTECTED] * As a general service to the users, please remember to post a summary of any useful responses to your questions. * To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list * Support to the list is provided at http://www.ai-geostats.org
AI-GEOSTATS: Extreme values?
Dear all, My question is: How to deal with the extreme/outlying values in a data set? I am dealing with heavy metal concentrations in soils from a mine area. The sample number is 223, and the samples are spatially evenly distributed with the sampling interval of 400 metres. There are several samples with extremely high values, which makes me feel uncomfortable. The percentiles of the dataset are listed as follows (in mg/kg): Zn Cu Pb Cd As Min 4 1 250.0 2 5%35 6 35 0.1 6 10% 40 7 410.2 7 25% 65 13 620.3 9 50% 122 18 168 0.6 15 75% 338 27 821 1.5 28 90% 907 56 2799 2.8 58 95% 1986 116 4490 4.2 80 96% 2462 151 4698 4.9 82 97% 3493 178 5413 6.2 91 98% 4697 207 7609 8.3 111 99% 6712 247 11750 12.4 184 Max 11473 1293 16305 48.5 1060 When doing geostatistical and statistical analyses, we need some confidence in dealing with the these very high extreme values which account for less than 2% of the total sample number. Any suggestions? Cheers, Chaosheng Zhang ===Dr. Chaosheng ZhangDepartment of GeographyNational University of IrelandGalwayIRELANDTel: +353-91-524411 ext. 2375Fax: +353-91-525700Email: [EMAIL PROTECTED]===