> Hi,
>
> I am working myself with pollution data in soils and i have very high
> values very close to very low values, and highly skewed
> distribution. I am more and more concerned with doing kriging on
> transformed data. This simply means we believe the data came
> from only one population. But what if it comes from 2 different
> populations representing 2 different polluting processes? Much
> more if we do believe there are no gross error measurements. The
> fact that high values are very close to low values would tell me that
> the spatial autocorrelation is violated locally. I would try first to see
> if the outliers (local and global) represent a different population, if
> these values cluster or not, how significant is the association high-
> low values, and if the global Moran's I increases if i eliminate the
> "outliers". Maybe the majority of the data which have a higher
> spatial autocorrelation belong to a "better expressed" diffusive
> process, (maybe an older one) while the rest of the data which
> were identified as outliers before, represent a more patch-y or point
> source pollution process which didn't have time to diffuse over the
> entire study area (a younger process, maybe?).

Exploratory analysis of the frequency distribution of the data (i.e. the
aggregated, non-spatial, frequency) could reveal the existence of two (or
more) populations. To evaluate the evidence in favour of such an
hypothesis, you could compare the hypothesis that the frequency
distribution is formed by a mixture of two (or more) specified
distributions versus the hypothesis that it is formed by only one. The
general topic in statistics is called 'mixture distribution analysis' (not
to be confused with 'mixture models'). Useful references are:

Everitt & Hand, 1981, Mixture distribution analysis. Chapman & Hall
Chen & Chen, 2001, Statistics and Probability Letters 52:125
Hawkins et al., 2001, Computational Statistics & Data Analysis 38:15
http://www.math.mcmaster.ca/peter/mix/mix.html

Some robust regression methods, for example, are based on treating the
data as coming from a mixture of two distributions, the main one, and a
contaminating distribution.

If you conclude that there are two (or more) distributions, then you can
compute the maximum conditional probability that any given data point
belong to any of the two (or more) distributions, and use this computation
to classify data. After this exploratory analysis, you could treat the two
(or more) populations differently, if there is evidence for a mixture, and
maybe even perform separate geostatistical analyses on the separate
populations.

I used this general strategy in the analysis of a time series of an index
of returns from investments in finantial markets. The strategy was
proposed by Hamilton, 1994, Time Series Analysis, Ch. 22, Princeton U. P.

Ruben

--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and "unsubscribe 
ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org

Reply via email to