Re: [ai-geostats] Re: F and T-test for samples drawn from the same p

2004-12-07 Thread Chaosheng Zhang
Dear Isobel,

Thanks for the information. Perhaps I didn't explain my request clearly.
What I need is to verify the ideas you suggested in the previous message.
Specifically, (1) Has anybody used the sill values (in geostatistics) to
replace the variances (in classical statistics) in F test? (2) Has anybody
used the global standard errors (in geostatistics) to replace the mean
standard errors (in classical statistics) in t-test?

Cheers,

Chaosheng


- Original Message - 
From: Isobel Clark [EMAIL PROTECTED]
To: Chaosheng Zhang [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, December 06, 2004 6:03 PM
Subject: [ai-geostats] Re: F and T-test for samples drawn from the same p


 There ws a pretty good paper on global standard errors
 in the 1984 APCOM proceedings, so I am sure it should
 be in the major textbooks by now.

 Commparing the sills is very straightforward, I think.

 Isobel
 http://geecosse.bizland.com/books.htm

  --- Chaosheng Zhang [EMAIL PROTECTED]
 wrote:
  Isobel,
 
  Good idea, and that's a step forward. Any references
  or is it still an idea?
 
  Cheers,
 
  Chaosheng
 
  - Original Message - 
  From: Isobel Clark [EMAIL PROTECTED]
  To: AI Geostats mailing list [EMAIL PROTECTED]
  Sent: Monday, December 06, 2004 1:07 PM
  Subject: Re: [ai-geostats] F and T-test for samples
  drawn from the same p
 
 
   Dear all
  
   I am having difficulty understanding why none of
  you
   want to try a spatial approach to statistics.
  Everyone
   is trying to make the 'independent' statistical
  tests
   work on spatial data. Try turning this around and
  look
   at the spatial aspect first.
  
   (1) Testing variances: the sill on the
  semi-variogram
   (total height of model) is theoretically a good
   estimate for the sample variance when
  auto-correlation
   or spatial dependence is present. Do your F test
  on
   that. Yes, you still have degrees of freedom
  problems,
   but with thousands of samples the 'infinity
  column'
   should be sufficient.
  
   (2) Testing means: the classic t-test in the
  presence
   of 'equal variances' requires the 'standard error'
  of
   each mean. For independent samples, this is
  s/sqrt(n).
   For spatially dependent samples, this is the
  kriging
   standard error for the global mean. Your only
  problem
   then is getting a global standard error.
  
   Isobel
   http://geoecosse.bizland.com/whatsnew.htm
  
  
 
 
 
 --
--
  
 
 
   * By using the ai-geostats mailing list you agree
  to follow its rules
   ( see
  http://www.ai-geostats.org/help_ai-geostats.htm )
  
   * To unsubscribe to ai-geostats, send the
  following in the subject or in
  the body (plain text format) of an email message to
  [EMAIL PROTECTED]
  
   Signoff ai-geostats
  
 
 
   * By using the ai-geostats mailing list you agree
 to
  follow its rules
  ( see
  http://www.ai-geostats.org/help_ai-geostats.htm )
 
  * To unsubscribe to ai-geostats, send the following
  in the subject or in the body (plain text format) of
  an email message to [EMAIL PROTECTED]
 
  Signoff ai-geostats








 * By using the ai-geostats mailing list you agree to follow its rules
 ( see http://www.ai-geostats.org/help_ai-geostats.htm )

 * To unsubscribe to ai-geostats, send the following in the subject or in
the body (plain text format) of an email message to [EMAIL PROTECTED]

 Signoff ai-geostats



* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

Re: [ai-geostats] Re: F and T-test for samples drawn from the same p

2004-12-07 Thread Isobel Clark
Digby

I see where you are coming from on this, but in fact
the sill is composed of those pairs of samples which
are independent of one another - or, at least, have
reached some background correlation. This is why the
sill makes a better estimate of the variance than the
conventional statistical measures, since it is based
on independent sampling.

Isobel
http://geoecosse.bizland.com/whatsnew.htm


 --- Digby Millikan [EMAIL PROTECTED] wrote: 
 While your talking about sill's being the global
 variance which I read 
 everywhere,
 isn't the global variance actually slightly less
 than the sill, as the 
 values below the
 range of the variogram are not included? i.e. the
 sill would be the global 
 variance
 when you have pure nugget effect.
 
 
 
  * By using the ai-geostats mailing list you agree
to
 follow its rules 
 ( see
 http://www.ai-geostats.org/help_ai-geostats.htm )
 
 * To unsubscribe to ai-geostats, send the following
 in the subject or in the body (plain text format) of
 an email message to [EMAIL PROTECTED]
 
 Signoff ai-geostats 

* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

Re: [ai-geostats] Re: F and T-test for samples drawn from the same p

2004-12-07 Thread Meng-Ying Li
Hi Isobel,

Could you explain why it would be a better estimate of the variance when
independance is considered? I'd rather think that we consider the
dependance when the overall variance are to be estimated-- if there
actually is dependance between values.

Or are you talking about modeling sill value by the stablizing tail on
the experimental variogram, instead of modeling by the calculated overall
variance?

Or, are we talking about variance of different definitions? I'd be
concerned if I missed some point of the original definition for variances,
like, the variance should be defined with no dependance beween values or
something like that. Frankly, I don't think I took the definition of
variance too serious when I was learning stats.


Meng-ying

 Digby

 I see where you are coming from on this, but in fact
 the sill is composed of those pairs of samples which
 are independent of one another - or, at least, have
 reached some background correlation. This is why the
 sill makes a better estimate of the variance than the
 conventional statistical measures, since it is based
 on independent sampling.

 Isobel

* By using the ai-geostats mailing list you agree to follow its rules 
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the 
body (plain text format) of an email message to [EMAIL PROTECTED]

Signoff ai-geostats

[ai-geostats] RE: F and T-test for samples drawn from the same p

2004-12-05 Thread Isobel Clark
Hence my recommendation to use cross cross validation
Isobel
http://geoecosse.bizland.com/books.htm



 --- Colin Daly [EMAIL PROTECTED] wrote: 
 
 
 Hi
 
 Sorry to repeat myself - but the samples are not
 independent.  Independance is a fundamental
 assumption of these types of tests - and you cannot
 interpret the tests if this assumption is violated. 
 In the situation where spatial correlation exists,
 the true standard error is nothing like as small as
 the (s/sqrt(n)) that Chaosheng discusses - because
 the sqrt(n) depends on independence.
 
 Again, as I said before, if the data has any type of
 trend in it, then it is completely meaningless to
 try and use these tests - and with no trend but some
 'ordinary' correlation, you must find a means of
 taking the data redundancy into account or risk get
 hopelessly pessimistic results (in the sense of
 rejecting the null hypothesis of equal means far too
 often)
 
 Consider a trivial example. A one dimensional random
 function which takes constant values over intervals
 of lenght one - so, it takes the value a_0 in the
 interval [0,1[  then the value a_1 in the interval
 [1,2[ and so on (let us suppose that each a_n term
 is drawn at random from a gaussian distribution with
 the same mean and variance for example).  Next
 suppose you are given samples on the interval [0,2].
 You spot that there seems to be a jump between [0,1[
 and [1,2[  - so you test for the difference in the
 means. If you apply an f test you will easily find
 that the mean differs (and more convincingly the
 more samples you have drawn!). However by
 construction of the random function,  the mean is
 not different.  We have been lulled into the false
 conclusion of differing means by assuming that all
 our data are independent.
 
 Regards
 
 Colin Daly
 
 
 -Original Message-
 From: Chaosheng Zhang
 [mailto:[EMAIL PROTECTED]
 Sent: Sun 12/5/2004 11:42 AM
 To:   [EMAIL PROTECTED]
 Cc:   Colin Badenhorst; Isobel Clark; Donald E. Myers
 Subject:  Re: [ai-geostats] F and T-test for samples
 drawn from the same p
 Dear all,
 
 
 
 I'm wondering if sample size (number of samples, n)
 is playing a role here.
 
 
 
 Since Colin is using Excel to analyse several
 thousand samples, I have checked the functions of
 t-tests in Excel. In the Data Analysis Tools help, a
 function is provided for t-Test: Two-Sample
 Assuming Unequal Variances analysis. This function
 is the same as those from many text books (There are
 other forms of the function). Unfortunately, I
 cannot find the function for assuming equal
 variances in Excel, but I assume they are similar,
 and should be the same as those from some text
 books.
 
 
 
 From the function, you can find that when the sample
 size is large you always get a large t value. When
 sample size is large enough, even slight differences
 between the mean values of two data sets (x bar and
 y bar) can be detected, and this will result in
 rejection of the null hypothesis. This is in fact
 quite reasonable. When the sample size is large, you
 are confident with the mean values (Central Limit
 Theorem), with a very small stand error
 (s/(sqrt(n)). Therefore, you are confident to detect
 the differences between the two data sets. Even
 though there is only a slight difference, you can
 still say, yes, they are significantly different.
 
 
 
 If you still remember some time ago, we had a
 discussion on large sample size problem for tests
 for normality. When the sample size is large enough,
 the result can always be expected (for real data
 sets), that is, rejection of the null hypothesis.
 
 
 
 Cheers,
 
 
 
 Chaosheng
 

--
 
 Dr. Chaosheng Zhang
 
 Lecturer in GIS
 
 Department of Geography
 
 National University of Ireland, Galway
 
 IRELAND
 
 Tel: +353-91-524411 x 2375
 
 Direct Tel: +353-91-49 2375
 
 Fax: +353-91-525700
 
 E-mail: [EMAIL PROTECTED]
 
 Web 1: www.nuigalway.ie/geography/zhang.html
 
 Web 2: www.nuigalway.ie/geography/gis/index.htm
 


 
 
 
 
 
 - Original Message -
 
 From: Isobel Clark [EMAIL PROTECTED]
 
 To: Donald E. Myers [EMAIL PROTECTED]
 
 Cc: Colin Badenhorst [EMAIL PROTECTED];
 [EMAIL PROTECTED]
 
 Sent: Saturday, December 04, 2004 11:49 AM
 
 Subject: [ai-geostats] F and T-test for samples
 drawn from the same p
 
 
 
 
 
  Don
 
 
 
  Thank you for the extended clarification of F and
 t
 
  hypothesis test. For those unfamiliar with the
 
  concept, it is worth noting that the F test for
 
  multiple means may be more familiar under the
 title
 
  Analysis of variance.
 
 
 
  My own brief answer was in the context of Colin's
 
  question, where it was quite clear that he was
 talking
 
  aboutthe simplest F variance-ratio and t
 comparison of
 
  means test.
 
 
 
  Isobel