Dear all,
 
My question is: How to deal with the extreme/outlying values in a data set?
 
I am dealing with heavy metal concentrations in soils from a mine area. The sample number is 223, and the samples are spatially evenly distributed with the sampling interval of 400 metres. There are several samples with extremely high values, which makes me feel uncomfortable. The percentiles of the dataset are listed as follows (in mg/kg):
 
               Zn    Cu     Pb     Cd    As
        Min     4     1     25    0.0     2
         5%    35     6     35    0.1     6
        10%    40     7     41    0.2     7
        25%    65    13     62    0.3     9
        50%   122    18    168    0.6    15
        75%   338    27    821    1.5    28
        90%   907    56   2799    2.8    58
        95%  1986   116   4490    4.2    80
        96%  2462   151   4698    4.9    82
        97%  3493   178   5413    6.2    91
        98%  4697   207   7609    8.3   111
        99%  6712   247  11750   12.4   184
        Max 11473  1293  16305   48.5  1060
When doing geostatistical and statistical analyses, we need some confidence in dealing with the these very high extreme values which account for less than 2% of the total sample number. 
 
Any suggestions?

Cheers,
 
Chaosheng Zhang
===================================
Dr. Chaosheng Zhang
Department of Geography
National University of Ireland
Galway
IRELAND

Tel: +353-91-524411 ext. 2375
Fax: +353-91-525700
Email: [EMAIL PROTECTED]
===================================
 

Reply via email to