Re: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Meng-Ying Li
> Meng-Ying,
>
>  For interests sake could you perform the same experiment for a
> stationary sample set of size 1000.
>
> Regards Digby

I did that. But with this short influence range of just 3 lags in a
population of size 1000 (0.3% of the domain), the correlation of data
doesn't do much influence to the population variance. That's why I looked
into other data set to speak for me.

For people interested in this phenomenum, I used the second realization of
SGSIM.OUT in the GSLIB manual as the population, add coordiate to this
realization by , calculated omni-directional variogram by
, and on the screen of  calculation it shows the overall
variance, which doesn't fit the sill in the variogram if you put the
maximum lag distance to 30.


Mng-yng

 On Thu, 9 Dec 2004, Digby Millikan wrote:

> Meng-Ying,
>
>  For interests sake could you perform the same experiment for a stationary
> sample set of size 1000.
>
> Regards Digby
>
>
>

* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

Re: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Digby Millikan
Meng-Ying,
For interests sake could you perform the same experiment for a stationary
sample set of size 1000.
Regards Digby

* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

RE: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Colin Daly
Title: RE: [ai-geostats] Re: Sill versus least-squares classical variance estimate







Hi Meng-Ying

 The calculation of  the experimental variance on a finite set of data (population or sample) is simply a mathematical operation
- in itself it has no more meaning that say adding the square of the first value to the cube root of the second and dividing the answer
by the geometic mean of the rest of them.
What bestows meaning on this particular calculation is (roughly speaking)
the assumption that
'each of the 27 values vary about the same mean value with the same distribution of variability at each point'. If we did not believe this  - if
for example each of the 27 points sampled completely different phenomena -
then there would be little point in using the variance as a means of
describing the data (spatial data or not..). In other words for the variance to have the sort of meaning that we usually ascribe to it as variation about the mean value - we interpret the observations as realisations of some random
 variable and then the calculated variance as estimating the mathematical idealisation of the mathematical variance of the RV.
 Likewise, if we interpret the data as being spatial data, we may calculate the experimental variogram and try to interpret it as an estimate of the theoretical variogram of some idealised random function. For a standard variogram to be interpretable we need the first and second moments of the first order increments Z(x+h)-Z(x) to be invariant under translations. Under the more stringent criteria that we may reasonably model our data as
2nd order stationary then the mean of Z(x) is constant everywhere and the variability about the mean is the same at each point  - so we can calculate the variance.  It is then a theorem, in the context of this model, that the sill is equal to the variance. 
 outside the context of the stationarity hypothesis, then the variance of the data looses its meaning as variation about a mean value - so is a meaningless
calculation. So, it is hardly a surprise, or concern, that it does not agree
 with the sill.  The variogram seems to retain its objectivity a bit longer,
until the increments are no longer well modeled as stationary.
For the small populations that you give, well the two calculations (variance and sill) are just numbers. If you try to ascribe meaning to them in the context of stationarity - then their likely variations about some 'true' value
comes into play.

Anyhow, I will be away for the next few days so will miss the end of this
topic (much to the relief of everyone on ai-geostat no doubt!) - but it was
fun - and took me back a good few years (wishing i listened a bit better in matheron's classes on his 'estimating and choosing' book)

Regards

Colin Daly

-Original Message-
From:   Meng-Ying Li [mailto:[EMAIL PROTECTED]]
Sent:   Wed 12/8/2004 9:52 PM
To: Colin Daly
Cc: Digby Millikan; ai-geostats
Subject:    RE: [ai-geostats] Re: Sill versus least-squares classical variance estimate
Hi Colin,

What I'm talking about in my example is comparing two descriptive
statistics for this population which consists of 27 data points. No
estimation here is involved, so the thing about confidence interval of
the mean or variance is not of concern here. And it doesn't matter which
model I used in the generator or what parameters I used, since I
re-calculated the population sill and variance after the data are
generated.

Let me state this clear:
(Capitalization indicates highlighting, not speaking tone :p)

1. I generated a POPULATION which is, believe it or not, a series of 27
   data.
2. The POPULATION variance, in my example, doesn't match the POPULATION
   sill calculated in the POPULATION variogram.
3. So how are we going to estimate the POPULATION variance by the sill in
   a SAMPLE, when the sill and the variance in the POPULATION just
   doesn't match?

And just a personal opinion, I would like to think geostatistic
theories apply to population of any size, as small as 27, or as large as
1,000,000. If I'm making an example that geostatistics doesn't apply, then
there's something to concern about in this approach.


Meng

On Wed, 8 Dec 2004, Colin Daly wrote:

>
> Hi Meng-Ying
>
> 27 points - you can't really calculate a variogram. With a range of 3 -
> you have about 9 correlation lenghts in the field. So as a crude
> approximation, even the standard deviation on the estimate of the mean
> would be of the order of s.d/sqrt(9) (I vaguely remember trying to get a
> more accurate version of this in the case of a Gaussian RF as an
> exercise in one of Matheron's classes...)
>
> so with s.d = 2.8 (or 2.4 ---similar answers), then standard error is
> 2.8/3=0.9 (approx)
>
> so your confidence interval for the mean would be  [m-1.8, m+1.8]
>
> -  this is the same order for both the estimate of the s

Re: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Digby Millikan
Meng-Ying,
You seem to have a point the theory of using the sill is not a hard and fast
rule, it is just if you're conducting a mine study and you plot a variogram
you can use the sill as a better estimate when you have many samples,
as you say only having 27 samples using the sill is a pretty rough estimate,
as would be expected.
What results do you get for a different 27 samples or 1000 samples, from
a practical side I've never seen a variogram that has any functional use 
what
so ever with 27 samples.

Digby 


* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

RE: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Meng-Ying Li
Hi Colin,

What I'm talking about in my example is comparing two descriptive
statistics for this population which consists of 27 data points. No
estimation here is involved, so the thing about confidence interval of
the mean or variance is not of concern here. And it doesn't matter which
model I used in the generator or what parameters I used, since I
re-calculated the population sill and variance after the data are
generated.

Let me state this clear:
(Capitalization indicates highlighting, not speaking tone :p)

1. I generated a POPULATION which is, believe it or not, a series of 27
   data.
2. The POPULATION variance, in my example, doesn't match the POPULATION
   sill calculated in the POPULATION variogram.
3. So how are we going to estimate the POPULATION variance by the sill in
   a SAMPLE, when the sill and the variance in the POPULATION just
   doesn't match?

And just a personal opinion, I would like to think geostatistic
theories apply to population of any size, as small as 27, or as large as
1,000,000. If I'm making an example that geostatistics doesn't apply, then
there's something to concern about in this approach.


Meng

On Wed, 8 Dec 2004, Colin Daly wrote:

>
> Hi Meng-Ying
>
> 27 points - you can't really calculate a variogram. With a range of 3 -
> you have about 9 correlation lenghts in the field. So as a crude
> approximation, even the standard deviation on the estimate of the mean
> would be of the order of s.d/sqrt(9) (I vaguely remember trying to get a
> more accurate version of this in the case of a Gaussian RF as an
> exercise in one of Matheron's classes...)
>
> so with s.d = 2.8 (or 2.4 ---similar answers), then standard error is
> 2.8/3=0.9 (approx)
>
> so your confidence interval for the mean would be  [m-1.8, m+1.8]
>
> -  this is the same order for both the estimate of the sill and for the
> direct estimate of the variance... both are bad
>
> That is for the comparitively easy case of the mean -  The situation
> for the variance is even worse - so there is no way that you can
> complain about the quality of the estimate.
>
> I'm not sure if you are suggesting that you should get different
> answers - or that there is some bias involved but to convince yourself
> that there is not repeat your experiment but use a length of 1,000,000
> instead of 27then at least we would get rid of most of the
> statistical fluctuations - and the estimates should be similar. How are
> you generating the random sequence - is it an AR process or something
> where the variance is known theoretically?
>
> Colin
>
> -Original Message-
> From: Meng-Ying Li [mailto:[EMAIL PROTECTED]
> Sent: Wed 12/8/2004 6:36 PM
> To:   Digby Millikan
> Cc:   ai-geostats
> Subject:  Re: [ai-geostats] Re: Sill versus least-squares classical 
> variance estimate
> Hi Digby and All,
>
> I did a little experiment on the idea that Digby mentioned: The sill will
> estimate the population variance, but found it not true in my experiment:
>
> 1. I generated a set of one-dimentional data with 27 points on regular
>unit spacings, which I'd like to take it as the true, or population
>value. On purpose, I generate the data so it has an influence range of
>three length units.
> 2. I calculated the experimental variogram. Notice that the variogram is
>the population variogram. The sill value is around 2.8.
> 3. But the population variance is 2.39, lower than the sill value.
>
> This confirms my doubt about using sill value as the estimate of
> population variance, since I calculate the variogram and variance based on
> all data points. Please tell me what you think. The data I generated are
> as follows:
>
> 0.056970748
> 0.14520424
> 0.849710204
> 1.650514605
> 1.101666385
> 1.015177986
> 2.150259206
> 2.830780659
> 0.223495817
> -2.47615958
> -3.372697392
> -0.530685611
> 0.786582177
> 0.970673
> 0.674755256
> 0.338461632
> 1.020874834
> 0.410936991
> 1.702892405
> 2.649748012
> 4.290179731
> 3.442015668
> 1.488818953
> 0.862788738
> 0.728709892
> 2.398182914
> 1.522546427
>
>
>
>
>
>
>
> DISCLAIMER:
> This message contains information that may be privileged or confidential and 
> is the property of the Roxar Group. It is intended only for the person to 
> whom it is addressed. If you are not the intended recipient, you are not 
> authorised to read, print, retain, copy, disseminate, distribute, or use this 
> message or any part thereof. If you receive this message in error, please 
> notify the sender immediately and delete all copies of this message.

* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

RE: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Colin Daly
Title: RE: [ai-geostats] Re: Sill versus least-squares classical variance estimate







Hi Digby

 Yes, I agree with what you say below - if your only aim was to estimate the variance and you only could collect 1000 samples - then choose them to be 'maximally independent' to reduce the variance of the error. But note, as Don said yesterday, a random sample, which is clustered, will also give an unbiased estimate of the variance but with a somewhat larger error of estimation. (of course, there may be reasons not to take all the samples as far from one another as possible - for example to estimate the variogram close to the origin, which is it's most important part - but that is another story). The bit that I disagreed with in your original message was the bit that said
"giving 999 samples to estimate the variance of the 1 million. This will give a better estimate of the variance you could calculate from the million by the least squares classical method, which is what Isobel was saying"
I understood this to say that you would do better with 1000 (or 999) points that with the full million...if that is not what you meant then, yes, i did misunderstand

Colin

-Original Message-
From:   Digby Millikan [mailto:[EMAIL PROTECTED]]
Sent:   Wed 12/8/2004 7:32 PM
To: ai-geostats
Cc:    
Subject:        Re: [ai-geostats] Re: Sill versus least-squares classical variance estimate
RE: [ai-geostats] Re: Sill versus least-squares classical variance
estimateColin,

 You misunderstood me, the 1 million data is the total unknown dataset. Say
you have a volume in a
mine and it's volume is 1 million 1 metre core samples. You drill the volume
and have a sample set
of 1000 1m core samples. You then analyse the statistics of the 1000 samples
to try and estimate
the variance of the total volume (1 million core samples).  So your estimate
of the variance comes
from the 1000 samples. You can plot the variogram of the 1000 samples and
you can also calculate
it's variance. You are trying to estimate the variance of the 1 million
peices of core which you do not
have. So you must decide wether your 1000 sample set is a true
representation of the 1 million.
Our argument is that samples within the 1000 which are clustered together do
not create a good
representation of the true dataset and will create a biased estimate.

Digby
www.users.on.net/~digbym










DISCLAIMER:
This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

Re: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Digby Millikan
Mat,
The point is the spatial randomness with which they were sampled. Typically
in a mining situation core samples far from follow a spatially random 
sampling
pattern.

Digby
Geolite Mining Systems
www.users.on.net/~digbym 


* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

RE: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Mat (University Account)
Hi Digby,
  Just a note - in circumstances that you have just described,
the greater the level and range of autocorrelation means the more precise
your estimate of the mean will be.

If your 1000 cores were randomly sampled from the population of 1 million,
then the fact that some (perhaps many) of pairs of datapoints lie less than
the (variogram) range apart
will not matter. s^2 is a valid, unbiased estimate of the population
variance. 
(The population is defined here as being the 1,000,000 possible cores that
could be taken from this area - not of the process that generated this
realization/data).

What's more the typical simple random sample (SRS) standard error (s^2/n),
will perform exactly as expected. 

If you chose to use a more sensible design, say a grid (systematic sample)
.. then your s^2/n would 
be in fact be an _overestimate_ of the standard error.

Mat

-Original Message-
From: Digby Millikan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, 9 December 2004 8:32 a.m.
To: ai-geostats
Subject: Re: [ai-geostats] Re: Sill versus least-squares classical variance
estimate

RE: [ai-geostats] Re: Sill versus least-squares classical variance
estimateColin,

 You misunderstood me, the 1 million data is the total unknown dataset. Say
you have a volume in a mine and it's volume is 1 million 1 metre core
samples. You drill the volume and have a sample set of 1000 1m core samples.
You then analyse the statistics of the 1000 samples to try and estimate the
variance of the total volume (1 million core samples).  So your estimate of
the variance comes from the 1000 samples. You can plot the variogram of the
1000 samples and you can also calculate it's variance. You are trying to
estimate the variance of the 1 million peices of core which you do not have.
So you must decide wether your 1000 sample set is a true representation of
the 1 million.
Our argument is that samples within the 1000 which are clustered together do
not create a good representation of the true dataset and will create a
biased estimate.

Digby
www.users.on.net/~digbym 






* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

Re: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Digby Millikan
RE: [ai-geostats] Re: Sill versus least-squares classical variance 
estimateColin,

You misunderstood me, the 1 million data is the total unknown dataset. Say 
you have a volume in a
mine and it's volume is 1 million 1 metre core samples. You drill the volume 
and have a sample set
of 1000 1m core samples. You then analyse the statistics of the 1000 samples 
to try and estimate
the variance of the total volume (1 million core samples).  So your estimate 
of the variance comes
from the 1000 samples. You can plot the variogram of the 1000 samples and 
you can also calculate
it's variance. You are trying to estimate the variance of the 1 million 
peices of core which you do not
have. So you must decide wether your 1000 sample set is a true 
representation of the 1 million.
Our argument is that samples within the 1000 which are clustered together do 
not create a good
representation of the true dataset and will create a biased estimate.

Digby
www.users.on.net/~digbym 


* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

RE: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Colin Daly
Title: RE: [ai-geostats] Re: Sill versus least-squares classical variance estimate






Hi Meng-Ying

27 points - you can't really calculate a variogram. With a range of 3 - you have about 9 correlation lenghts in the field. So as a crude approximation, even the standard deviation on the estimate of the mean would be of the order of s.d/sqrt(9) (I vaguely remember trying to get a more accurate version of this in the case of a Gaussian RF as an exercise in one of Matheron's classes...)

so with s.d = 2.8 (or 2.4 ---similar answers), then standard error is 2.8/3=0.9 (approx) 

so your confidence interval for the mean would be  [m-1.8, m+1.8]

-  this is the same order for both the estimate of the sill and for the direct estimate of the variance... both are bad

That is for the comparitively easy case of the mean -  The situation for the variance is even worse - so there is no way that you can complain about the quality of the estimate.

I'm not sure if you are suggesting that you should get different answers - or that there is some bias involved but to convince yourself that there is not
repeat your experiment but use a length of 1,000,000 instead of 27then at least we would get rid of most of the statistical fluctuations - and the estimates should be similar. How are you generating the random sequence - is it an AR process or something where the variance is known theoretically?

Colin 

-Original Message-
From:   Meng-Ying Li [mailto:[EMAIL PROTECTED]]
Sent:   Wed 12/8/2004 6:36 PM
To: Digby Millikan
Cc: ai-geostats
Subject:    Re: [ai-geostats] Re: Sill versus least-squares classical variance estimate
Hi Digby and All,

I did a little experiment on the idea that Digby mentioned: The sill will
estimate the population variance, but found it not true in my experiment:

1. I generated a set of one-dimentional data with 27 points on regular
   unit spacings, which I'd like to take it as the true, or population
   value. On purpose, I generate the data so it has an influence range of
   three length units.
2. I calculated the experimental variogram. Notice that the variogram is
   the population variogram. The sill value is around 2.8.
3. But the population variance is 2.39, lower than the sill value.

This confirms my doubt about using sill value as the estimate of
population variance, since I calculate the variogram and variance based on
all data points. Please tell me what you think. The data I generated are
as follows:

0.056970748
0.14520424
0.849710204
1.650514605
1.101666385
1.015177986
2.150259206
2.830780659
0.223495817
-2.47615958
-3.372697392
-0.530685611
0.786582177
0.970673
0.674755256
0.338461632
1.020874834
0.410936991
1.702892405
2.649748012
4.290179731
3.442015668
1.488818953
0.862788738
0.728709892
2.398182914
1.522546427









DISCLAIMER:
This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

RE: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Colin Daly
Title: RE: [ai-geostats] Re: Sill versus least-squares classical variance estimate







Hi Digby

Sorry to say - but suggesting that less data is systematically better is mistaken - this is fundemental...and is contained in the intro pages of any good intro to geostats.  If the data is clustered - then you might have to decluster in some sense - But with an unbiased sample then you will use all million samples. Please, any new users of geostats lucky to have a million samples - don't throw 99.9% of your data away!!

Declustering is about trying to remove the bias that most realistic sampling strategies have (e,g, in petroleum, you tend to drill into the best reservoir regions first...). If your data is an unbaised sample from the true histogram (ie what you would get by mining out the resource fully) then you will use all  of it for estimating any statistic. This does not mean that the samples have to be far apart - just that they don't cluster into high or low regions.

There seems to be some confusion about independence and estimates. Suppose the mean (and/or variance) is being estimated (provisos: 1) unbaised sample data 2) stationary (so that mean and variance have a meaning)), then the estimate is unbiased - irrespective of the correlation of the data - what does depend on the correlation is the error in the estimation. For a zero correlation length, then the variance of error of the mean drops as 1/n. For a non-zero correlation length it drops slower than 1/n  -  but you do not get a quicker convergence by throwing away good data - in fact virtually always you will get a strictly worse estimate!

Regards

Colin


-Original Message-
From:   Digby Millikan [mailto:[EMAIL PROTECTED]]
Sent:   Wed 12/8/2004 5:12 PM
To: ai-geostats; Meng-Ying  Li
Cc:
Subject:        Re: [ai-geostats] Re: Sill versus least-squares classical variance estimate
Dear Meng-Ying,

 It's not that you are defining variance to be the variance of data to be
data
beyond the range of the variogram. Say you have a panel made up of a
1 million samples which covers the entire panel, then you select 1000
samples
to estimate the variance. If two samples of the thousand are within range of
each other (close and similar value), then you are effectively doubling up
on one of the samples, so to give a better representation of the 1 million
samples you are better to remove the doubled up sample, giving 999
samples to estimate the variance of the 1 million. This will give a better
estimate of the variance you could calculate from the million by the least
squares classical method, which is what Isobel was saying.

Regards Digby










DISCLAIMER:
This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

Re: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Meng-Ying Li
Hi Digby and All,

I did a little experiment on the idea that Digby mentioned: The sill will
estimate the population variance, but found it not true in my experiment:

1. I generated a set of one-dimentional data with 27 points on regular
   unit spacings, which I'd like to take it as the true, or population
   value. On purpose, I generate the data so it has an influence range of
   three length units.
2. I calculated the experimental variogram. Notice that the variogram is
   the population variogram. The sill value is around 2.8.
3. But the population variance is 2.39, lower than the sill value.

This confirms my doubt about using sill value as the estimate of
population variance, since I calculate the variogram and variance based on
all data points. Please tell me what you think. The data I generated are
as follows:

0.056970748
0.14520424
0.849710204
1.650514605
1.101666385
1.015177986
2.150259206
2.830780659
0.223495817
-2.47615958
-3.372697392
-0.530685611
0.786582177
0.970673
0.674755256
0.338461632
1.020874834
0.410936991
1.702892405
2.649748012
4.290179731
3.442015668
1.488818953
0.862788738
0.728709892
2.398182914
1.522546427


* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

[ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Isobel Clark
Meng-Ying

No, I do not think we are communicating.

The variance of data values is not affected by
correlation between the sample values.

The estimated variance for the population IS affected
by correlation between the sample values. Statistical
inference about the population is based on the
assumption that samples were taken randomly and
independently from that population. 

It is the process of estimation of unknown parameters
by classical statistical theory which requires these
assumptions.

Geostatistical inference does not require absence of
correlation, quite the contrary. The semi-variogram
graph is constructed on the assumption that there is a
correlation between samples and that this depends on
distance and direction between the pair of samples.

If we have a stationary situation, where the mean and
variance are constant over the study area, the
semi-variogram generally reaches a sill value. The
distance at which this happens is interpreted as that
distance beyond which the correlation is zero. Sample
pairs at this distance or greater can be used to
estimate the variance, since the statistical
assumptions are now satisifed.

Isobel
http://geoecosse.bizland.com/whatsnew.htm




 --- Meng-Ying  Li <[EMAIL PROTECTED]> wrote: 
> Hi Isobel,
> 
> I understand all points you pointed out, but I'm not
> sure why the variance
> should be defined as data NOT SPATIALLY CORRELATED
> when they may or may
> not be correlated.
> 
> Thanks for the clarification, though, I don't think
> I'd be able to
> clarify the things you clarifies. You're good.
> 
> 
> Meng-ying
> 
> On Wed, 8 Dec 2004, Isobel Clark wrote:
> 
> > Meng-Ying
> >
> > I don't know how to say this any other way. At
> > distances larger than the range of influence,
> samples
> > are NOT SPATIALLY CORRELATED.
> >
> > The variance of the difference between two
> > uncorrelated samples is twice the variance of one
> > sample around the mean.
> >
> > The semi-variogram is one-half of the variance of
> the
> > difference.
> >
> > Hence the sill is (theoretically) equal to the
> > variance. The sill is based on all pairs of
> samples
> > found at a distance greater thn the range of
> > influence.
> >
> > The classical statistical estimator of the
> variance is
> > only unbiassed if the correct degrees of freedom
> are
> > used. If the samples are correlated, n-1 is NOT
> the
> > correct degrees of freedom.
> >
> > All explained in immense detail in Practical
> > Geostatistics 2000, Clark and Harper,
> > http://geoecosse.hypermart.net
> >
> > Did I get it clear this time?
> > Isobel
> >
> >  --- Meng-Ying  Li <[EMAIL PROTECTED]> wrote:
> > > I understand why it is not appropriate to force
> the
> > > sill so it matches the
> > > sample variance. My question is, why estimate
> the
> > > overall variance by the
> > > sill value when data are actually correlated?
> > >
> > >
> > > Meng-ying
> > >
> > > On Tue, 7 Dec 2004, Isobel Clark wrote:
> > >
> > > > Meng-Ying
> > > >
> > > > We are talking about estimating the variance
> of a
> > > set
> > > > of samples where spatial dependence exists.
> > > >
> > > > The classical statistical unbiassed estimator
> of
> > > the
> > > > population variance is s-squared which is the
> sum
> > > of
> > > > the squared deviations from the mean divided
> by
> > > the
> > > > relevant degrees of freedom. If the samples
> are
> > > not
> > > > inter-correlated, the relevant degrees of
> freedom
> > > are
> > > > (n-1). This gives the formula you find in any
> > > > introductory statistics book or course.
> > > >
> > > > If samples are not independent of one another,
> the
> > > > degrees of freedom issue becomes a problem and
> the
> > > > classical estimator will be biassed (generally
> too
> > > > small on average).
> > > >
> > > > In theory, pairs of samples beyond the range
> of
> > > > influence on a semi-variogram graph are
> > > independent of
> > > > one another. In theory, the variance of the
> > > difference
> > > > betwen two values which are uncorrelated is
> twice
> > > the
> > > > variance of one sample around the population
> mean.
> > > > This is thought to be why Matheron defined the
> > > > semi-variogram (one-half the squared
> difference)
> > > so
> > > > that the final sill would be (theoretically)
> equal
> > > to
> > > > the population variance.
> > > >
> > > > There are computer software packages which
> will
> > > draw a
> > > > line on your experimental semi-variogram at
> the
> > > height
> > > > equivalent to the classically calculated
> sample
> > > > variance. Some people try to force their
> > > > semi-variogram models to go through this line.
> > > This is
> > > > dumb as the experimental sill is a better
> estimate
> > > > because it does have the degrees of freedom it
> is
> > > > supposed to have.
> > > >
> > > > I am not sure whether this is clear enough. If
> you
> > > > email me off the list, I can recommend
> > > publications
> > > > which might help you out.
> > > >
> > > > Isobel
> >

Re: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Digby Millikan
Dear Meng-Ying,
If you imagine the 1 million samples (total dataset and area) overlying 
a pattern of 1000 low and high grade regions, your 1000 sample set
you would only want one sample from each low grade and each high 
grade region, if you had two samples in one low grade region, this
region would be overweighted (introduction of bias to your estimate),
so you would want to remove the extra sample, the two samples being
in the same low grade patch will be within range of each other.

Digby

* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

Re: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Digby Millikan
Dear Meng-Ying,
It's not that you are defining variance to be the variance of data to be 
data
beyond the range of the variogram. Say you have a panel made up of a
1 million samples which covers the entire panel, then you select 1000 
samples
to estimate the variance. If two samples of the thousand are within range of
each other (close and similar value), then you are effectively doubling up
on one of the samples, so to give a better representation of the 1 million
samples you are better to remove the doubled up sample, giving 999
samples to estimate the variance of the 1 million. This will give a better
estimate of the variance you could calculate from the million by the least
squares classical method, which is what Isobel was saying.

Regards Digby

* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

Re: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Meng-Ying Li
Thanks Digby,

You answered more to the question I asked. In this case I assume that you
define the overall variance of a random field to be the variance of data
spaced beyond the variogram range-- which I can buy, but not quite sure
if this definition is practical in all cases-- and that's why I asked
this question about estimating variance initially. In my point of view,
expected variance for samples with CSR would be a better definition for
the overall variance. That's some personal preference, however.

And since you mentioned declustering, I do know a few declustering
approaches that will solve the problem of data clusters, but it is
doubtful whether these approaches removes all effect of correlation
between point data.

I'm sure I understand all points of the replies to my question. I think
I'm just trying to make sure the definition of variance applies to all
cases of application.

Meng-ying

 On Wed, 8 Dec 2004, Digby Millikan wrote:

> Meng,
>
>  You wan't to have an evenly spaced sample pattern for you estimation
> of the variance, if you use samples within range of each others then these
> are clusters of samples which will overweight that area, hence by removing
> samples below the range, you remove "clusters" of samples. A common
> method of performing statistics on spatial data is first to perform data
> declustering, than calculate your statistics, however as Isobel points
> out a fast way to do this is remove samples below the range.
>
> Digby
>
>
>
>

* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

Re: [ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Digby Millikan
Meng,
If your sample grid spacing is regular, I assume it wouldn't make
much difference, however in mining drilling campaigns commonly
have high amounts of clustering of drillhole data in high grade and
anomalistic areas, and grade control and other forms of sampling
similarly.
Digby

* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

[ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-08 Thread Digby Millikan
Meng,
You wan't to have an evenly spaced sample pattern for you estimation
of the variance, if you use samples within range of each others then these
are clusters of samples which will overweight that area, hence by removing
samples below the range, you remove "clusters" of samples. A common 
method of performing statistics on spatial data is first to perform data 
declustering, than calculate your statistics, however as Isobel points
out a fast way to do this is remove samples below the range.

Digby

* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

[ai-geostats] Re: Sill versus least-squares classical variance estimate

2004-12-07 Thread Meng-Ying Li
I understand why it is not appropriate to force the sill so it matches the
sample variance. My question is, why estimate the overall variance by the
sill value when data are actually correlated?


Meng-ying

On Tue, 7 Dec 2004, Isobel Clark wrote:

> Meng-Ying
>
> We are talking about estimating the variance of a set
> of samples where spatial dependence exists.
>
> The classical statistical unbiassed estimator of the
> population variance is s-squared which is the sum of
> the squared deviations from the mean divided by the
> relevant degrees of freedom. If the samples are not
> inter-correlated, the relevant degrees of freedom are
> (n-1). This gives the formula you find in any
> introductory statistics book or course.
>
> If samples are not independent of one another, the
> degrees of freedom issue becomes a problem and the
> classical estimator will be biassed (generally too
> small on average).
>
> In theory, pairs of samples beyond the range of
> influence on a semi-variogram graph are independent of
> one another. In theory, the variance of the difference
> betwen two values which are uncorrelated is twice the
> variance of one sample around the population mean.
> This is thought to be why Matheron defined the
> semi-variogram (one-half the squared difference) so
> that the final sill would be (theoretically) equal to
> the population variance.
>
> There are computer software packages which will draw a
> line on your experimental semi-variogram at the height
> equivalent to the classically calculated sample
> variance. Some people try to force their
> semi-variogram models to go through this line. This is
> dumb as the experimental sill is a better estimate
> because it does have the degrees of freedom it is
> supposed to have.
>
> I am not sure whether this is clear enough. If you
> email me off the list, I can recommend publications
> which might help you out.
>
> Isobel
> http://geoecosse.bizland.com/books.htm
>
>  --- Meng-Ying  Li <[EMAIL PROTECTED]> wrote:
> > Hi Isobel,
> >
> > Could you explain why it would be a better estimate
> > of the variance when
> > independance is considered? I'd rather think that we
> > consider the
> > dependance when the overall variance are to be
> > estimated-- if there
> > actually is dependance between values.
> >
> > Or are you talking about modeling sill value by the
> > stablizing tail on
> > the experimental variogram, instead of modeling by
> > the calculated overall
> > variance?
> >
> > Or, are we talking about variance of different
> > definitions? I'd be
> > concerned if I missed some point of the original
> > definition for variances,
> > like, the variance should be defined with no
> > dependance beween values or
> > something like that. Frankly, I don't think I took
> > the definition of
> > variance too serious when I was learning stats.
> >
> >
> > Meng-ying
> >
> > > Digby
> > >
> > > I see where you are coming from on this, but in
> > fact
> > > the sill is composed of those pairs of samples
> > which
> > > are independent of one another - or, at least,
> > have
> > > reached some background correlation. This is why
> > the
> > > sill makes a better estimate of the variance than
> > the
> > > conventional statistical measures, since it is
> > based
> > > on independent sampling.
> > >
> > > Isobel
> >
>

* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats