Re: [ai-geostats] F and T-test for samples drawn from the same p
Dear all, I'm wondering if sample size (number of samples, n) is playing a role here. Since Colin is using Excel to analyse several thousand samples, I have checked the functions of t-tests in Excel. In the Data Analysis Tools help, a function is provided for "t-Test: Two-Sample Assuming Unequal Variances analysis". This function is the same as those from manytext books (There are other forms of the function). Unfortunately, I cannot find the function for "assuming equal variances" in Excel, but I assume they aresimilar, and should be the same as those from some text books. From the function,you can find that when the sample size is largeyou always get a large t value. When sample size is large enough, even slight differences betweenthe mean values of two data sets (x bar and y bar) can be detected, and this will result in rejection of the null hypothesis. This is in fact quite reasonable. When the sample size is large, you are confident with the mean values (Central Limit Theorem), with avery small stand error (s/(sqrt(n)). Therefore, you are confident to detect the differences between the two data sets. Even though there is only a slight difference, you can still say, yes, they are "significantly" different. If you still remember some time ago, we had a discussion on large sample size problem for tests for normality. When the sample size is large enough, the result can always be expected (for real data sets), that is, rejection of the null hypothesis. Cheers, Chaosheng --Dr. Chaosheng ZhangLecturer in GISDepartment of GeographyNational University of Ireland, GalwayIRELANDTel: +353-91-524411 x 2375Direct Tel: +353-91-49 2375Fax: +353-91-525700E-mail: [EMAIL PROTECTED]Web 1: www.nuigalway.ie/geography/zhang.htmlWeb 2: www.nuigalway.ie/geography/gis/index.htm - Original Message - From: "Isobel Clark" [EMAIL PROTECTED] To: "Donald E. Myers" [EMAIL PROTECTED] Cc: "Colin Badenhorst" [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Saturday, December 04, 2004 11:49 AM Subject: [ai-geostats] F and T-test for samples drawn from the same p Don Thank you for the extended clarification of F and t hypothesis test. For those unfamiliar with the concept, it is worth noting that the F test for multiple means may be more familiar under the title "Analysis of variance". My own brief answer was in the context of Colin's question, where it was quite clear that he was talking aboutthe simplest F variance-ratio and t comparison of means test. Isobel * By using the ai-geostats mailing list you agree to follow its rules ( see http://www.ai-geostats.org/help_ai-geostats.htm ) * To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to [EMAIL PROTECTED] Signoff ai-geostats * By using the ai-geostats mailing list you agree to follow its rules ( see http://www.ai-geostats.org/help_ai-geostats.htm ) * To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to [EMAIL PROTECTED] Signoff ai-geostats
[ai-geostats] RE: F and T-test for samples drawn from the same p
Hence my recommendation to use cross cross validation Isobel http://geoecosse.bizland.com/books.htm --- Colin Daly [EMAIL PROTECTED] wrote: Hi Sorry to repeat myself - but the samples are not independent. Independance is a fundamental assumption of these types of tests - and you cannot interpret the tests if this assumption is violated. In the situation where spatial correlation exists, the true standard error is nothing like as small as the (s/sqrt(n)) that Chaosheng discusses - because the sqrt(n) depends on independence. Again, as I said before, if the data has any type of trend in it, then it is completely meaningless to try and use these tests - and with no trend but some 'ordinary' correlation, you must find a means of taking the data redundancy into account or risk get hopelessly pessimistic results (in the sense of rejecting the null hypothesis of equal means far too often) Consider a trivial example. A one dimensional random function which takes constant values over intervals of lenght one - so, it takes the value a_0 in the interval [0,1[ then the value a_1 in the interval [1,2[ and so on (let us suppose that each a_n term is drawn at random from a gaussian distribution with the same mean and variance for example). Next suppose you are given samples on the interval [0,2]. You spot that there seems to be a jump between [0,1[ and [1,2[ - so you test for the difference in the means. If you apply an f test you will easily find that the mean differs (and more convincingly the more samples you have drawn!). However by construction of the random function, the mean is not different. We have been lulled into the false conclusion of differing means by assuming that all our data are independent. Regards Colin Daly -Original Message- From: Chaosheng Zhang [mailto:[EMAIL PROTECTED] Sent: Sun 12/5/2004 11:42 AM To: [EMAIL PROTECTED] Cc: Colin Badenhorst; Isobel Clark; Donald E. Myers Subject: Re: [ai-geostats] F and T-test for samples drawn from the same p Dear all, I'm wondering if sample size (number of samples, n) is playing a role here. Since Colin is using Excel to analyse several thousand samples, I have checked the functions of t-tests in Excel. In the Data Analysis Tools help, a function is provided for t-Test: Two-Sample Assuming Unequal Variances analysis. This function is the same as those from many text books (There are other forms of the function). Unfortunately, I cannot find the function for assuming equal variances in Excel, but I assume they are similar, and should be the same as those from some text books. From the function, you can find that when the sample size is large you always get a large t value. When sample size is large enough, even slight differences between the mean values of two data sets (x bar and y bar) can be detected, and this will result in rejection of the null hypothesis. This is in fact quite reasonable. When the sample size is large, you are confident with the mean values (Central Limit Theorem), with a very small stand error (s/(sqrt(n)). Therefore, you are confident to detect the differences between the two data sets. Even though there is only a slight difference, you can still say, yes, they are significantly different. If you still remember some time ago, we had a discussion on large sample size problem for tests for normality. When the sample size is large enough, the result can always be expected (for real data sets), that is, rejection of the null hypothesis. Cheers, Chaosheng -- Dr. Chaosheng Zhang Lecturer in GIS Department of Geography National University of Ireland, Galway IRELAND Tel: +353-91-524411 x 2375 Direct Tel: +353-91-49 2375 Fax: +353-91-525700 E-mail: [EMAIL PROTECTED] Web 1: www.nuigalway.ie/geography/zhang.html Web 2: www.nuigalway.ie/geography/gis/index.htm - Original Message - From: Isobel Clark [EMAIL PROTECTED] To: Donald E. Myers [EMAIL PROTECTED] Cc: Colin Badenhorst [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Saturday, December 04, 2004 11:49 AM Subject: [ai-geostats] F and T-test for samples drawn from the same p Don Thank you for the extended clarification of F and t hypothesis test. For those unfamiliar with the concept, it is worth noting that the F test for multiple means may be more familiar under the title Analysis of variance. My own brief answer was in the context of Colin's question, where it was quite clear that he was talking aboutthe simplest F variance-ratio and t comparison of means test. Isobel
RE: [ai-geostats] F and T-test for samples drawn from the same p
Hello, I am currently principal investigator on a major NIH grant that aims to develop software for test of hypothesis using alternate hypothesis specified by the user and that differ from the omnibus spatial independence; we called them spatial neutral models. For example, you can test for clusters of cancer rates above and beyond a regional background in exposure. The p-values are computed using randomization and I applied geostatistical simulation to generate multiple realizations that are then used to derive the empirical distribution of the test statistic. I presented an example during the last GeoEnv conference and I put a PDF copy of the paper, which is in press for the moment, on my website. Cheers, Pierre Dr. Pierre Goovaerts President of PGeostat, LLC Chief Scientist with Biomedware Inc. 710 Ridgemont Lane Ann Arbor, Michigan, 48103-1535, U.S.A. E-mail: [EMAIL PROTECTED] Phone: (734) 668-9900 Fax: (734) 668-7788 http://alumni.engin.umich.edu/~goovaert/ On Sun, 5 Dec 2004, Colin Daly wrote: Hi Sorry to repeat myself - but the samples are not independent. Independance is a fundamental assumption of these types of tests - and you cannot interpret the tests if this assumption is violated. In the situation where spatial correlation exists, the true standard error is nothing like as small as the (s/sqrt(n)) that Chaosheng discusses - because the sqrt(n) depends on independence. Again, as I said before, if the data has any type of trend in it, then it is completely meaningless to try and use these tests - and with no trend but some 'ordinary' correlation, you must find a means of taking the data redundancy into account or risk get hopelessly pessimistic results (in the sense of rejecting the null hypothesis of equal means far too often) Consider a trivial example. A one dimensional random function which takes constant values over intervals of lenght one - so, it takes the value a_0 in the interval [0,1[ then the value a_1 in the interval [1,2[ and so on (let us suppose that each a_n term is drawn at random from a gaussian distribution with the same mean and variance for example). Next suppose you are given samples on the interval [0,2]. You spot that there seems to be a jump between [0,1[ and [1,2[ - so you test for the difference in the means. If you apply an f test you will easily find that the mean differs (and more convincingly the more samples you have drawn!). However by construction of the random function, the mean is not different. We have been lulled into the false conclusion of differing means by assuming that all our data are independent. Regards Colin Daly -Original Message- From: Chaosheng Zhang [mailto:[EMAIL PROTECTED] Sent: Sun 12/5/2004 11:42 AM To: [EMAIL PROTECTED] Cc: Colin Badenhorst; Isobel Clark; Donald E. Myers Subject: Re: [ai-geostats] F and T-test for samples drawn from the same p Dear all, I'm wondering if sample size (number of samples, n) is playing a role here. Since Colin is using Excel to analyse several thousand samples, I have checked the functions of t-tests in Excel. In the Data Analysis Tools help, a function is provided for t-Test: Two-Sample Assuming Unequal Variances analysis. This function is the same as those from many text books (There are other forms of the function). Unfortunately, I cannot find the function for assuming equal variances in Excel, but I assume they are similar, and should be the same as those from some text books. From the function, you can find that when the sample size is large you always get a large t value. When sample size is large enough, even slight differences between the mean values of two data sets (x bar and y bar) can be detected, and this will result in rejection of the null hypothesis. This is in fact quite reasonable. When the sample size is large, you are confident with the mean values (Central Limit Theorem), with a very small stand error (s/(sqrt(n)). Therefore, you are confident to detect the differences between the two data sets. Even though there is only a slight difference, you can still say, yes, they are significantly different. If you still remember some time ago, we had a discussion on large sample size problem for tests for normality. When the sample size is large enough, the result can always be expected (for real data sets), that is, rejection of the null hypothesis. Cheers, Chaosheng -- Dr. Chaosheng Zhang Lecturer in GIS Department of Geography National University of Ireland, Galway IRELAND Tel: +353-91-524411 x 2375 Direct Tel: +353-91-49 2375 Fax: +353-91-525700 E-mail: [EMAIL PROTECTED] Web 1: www.nuigalway.ie/geography/zhang.html Web 2: www.nuigalway.ie/geography/gis/index.htm
RE: [ai-geostats]F and T-test for samples drawn from the same p
Sorry if this is somewhat off subject - but I'd like to discuss (and invite further comments) on Colin's comments regarding the effects of independence on standard statistical tests. He mentioned that a lack of independence typically removes a large part of the usability of basic tests unless corrected for spatial variables. The standard argument goes something like: 'Spatial autocorrelation means that the sampled values are not independent, so you have less information than you think (i.e. your estimated degrees of freedom are too large). Consequently, the variance is underestimated and confidence intervals are too small (or the type I error is under-reported)'. My understanding is that this argument is quite valid when you are inferring beyond the area from which you have sampled (or inferring about the stochastic process generating the sample data). However, it's probably worth mentioning that if you are simply looking to compare the parameters of specified areas (or volumes) and you have used a sensible design-based sampling method (e.g. SRS), then autocorrelation poses no problem. i.e. if you have randomly sampled some regionalized variable in volume X and volume Y, and simply wish to determine if, say, the population means of these volumes are different -- then the sample points will be independent (relative to the area of inference). In this scenario, classical statistical tests can be used to compare the realization parameters of the different areas. The question that often is failed to be asked is - What inference space are we interested in? Do we wish to discuss the process that generated the data, or simply make inference about the actual physical realization? Geostatistics avoids many complications with autocorrelation by typically restricting inference to the actual data, rather than the stochastic process. In your particular case I would expect that statistically showing that: (a) two horizons exhibit the same mineral content/spatial structure and (b) two horizons derive from the same process are very different problems. Certainly within biology, the difference between these situations does not seem to be well understood - I am curious if geostatisticians distinguish between them as a matter of course? regards, Matthew Pawley --- Colin Daly [EMAIL PROTECTED] wrote: Hi Sorry to repeat myself - but the samples are not independent. Independance is a fundamental assumption of these types of tests - and you cannot interpret the tests if this assumption is violated. In the situation where spatial correlation exists, the true standard error is nothing like as small as the (s/sqrt(n)) that Chaosheng discusses - because the sqrt(n) depends on independence. Again, as I said before, if the data has any type of trend in it, then it is completely meaningless to try and use these tests - and with no trend but some 'ordinary' correlation, you must find a means of taking the data redundancy into account or risk get hopelessly pessimistic results (in the sense of rejecting the null hypothesis of equal means far too often) Consider a trivial example. A one dimensional random function which takes constant values over intervals of lenght one - so, it takes the value a_0 in the interval [0,1[ then the value a_1 in the interval [1,2[ and so on (let us suppose that each a_n term is drawn at random from a gaussian distribution with the same mean and variance for example). Next suppose you are given samples on the interval [0,2]. You spot that there seems to be a jump between [0,1[ and [1,2[ - so you test for the difference in the means. If you apply an f test you will easily find that the mean differs (and more convincingly the more samples you have drawn!). However by construction of the random function, the mean is not different. We have been lulled into the false conclusion of differing means by assuming that all our data are independent. Regards Colin Daly * By using the ai-geostats mailing list you agree to follow its rules ( see http://www.ai-geostats.org/help_ai-geostats.htm ) * To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to [EMAIL PROTECTED] Signoff ai-geostats
[ai-geostats] Large samples, t tests, etc
Most of the tests of hypotheses that have been mentioned recently on this list serv are non-spatial, i.e., there is nothing in the underlying statistical assumptions that specifically pertains to spatial data. The one common assumption is random sampling or iid (independent, identically distributed). In many typical (non-spatial) applications, this assumption is ensured by the design of the experiment, i.e., the way the data is generated and collected. Spatial data problems more often involve observational data which does not easily lend itself to being able to design the experiment in such a way as to ensure this basic assumption. In the case of spatial data, random site selection does not necessarily correspond to random sampling. In the case of the random function model implicit in most of geostatistics, the data is a non-random sample from one realization of the random function (in that context using random site selection does not then make it a random sample). Note that not all spatial statistical analysis methods are based on this random function model. Normality is another common underlying assumption in many hypothesis tests. In the case of random sampling from a distribution with a finite moment of order 2+delta, delta 0 then the distribution of the sample mean will converge IN DISTRIBUTION to a normal distribution. This means that a sequence of functions is converging to another function. It is important to note that this convergence may be pointwise or uniform or uniform on intervals. Pointwise is you usually get from the Central Limit Theorem, this means that the rate of convergence depends on where you are on the curve. The difference between using a normal statistic vs using a t-statistic usually is the difference between a known variance and an unknown variance (and hence estimated). But in either case the variance is assumed to exist and be finite. The sample variance can always be computed from a data set but that does not ensure that the variance of the distribution exists. The quotient of two standard normal random variables has a Cauchy distribution, neither the mean nor the variance is finite. Hence the Central Limit Theorem does not apply. In the case of a non-normal distribution one really needs to know how robust the test is to deviation from normality, increasing the sample size does not really solve this problem. Finally note that most tests of hypotheses are not exactly neutral, there is a tendency to accept the null hypothesis UNLESS there is evidence against the null hypothesis, this is one of the reasons for the emphasis on the POWER of the test. Often the null hypothesis is the status quo and this logical stance for the null and alternative hypotheses is okay but not in all circumstances. However in some tests for normality (which still depend on the assumption of random sampling) the test is set up in such a way that the null hypothesis corresponds to the conclusion of normality. E.g., Chi-square tests. If you are trying to argue that it is safe to assume normality then you want to accept the null hypothesis and you should want a very high power for the test, you don't want a small p-vallue, instead you want a very large p-value. Note that the normal distribution is symmetric but not all symmetric distributions are normal. Donald Myers * By using the ai-geostats mailing list you agree to follow its rules ( see http://www.ai-geostats.org/help_ai-geostats.htm ) * To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to [EMAIL PROTECTED] Signoff ai-geostats
Re: [ai-geostats] F and T-test for samples drawn from the same p
Every resource model I have done, I always subdivide the populations into those of equal mean and variance, so stationarity is obeyed, is this the correct procedure, I havn't read Mining Geostatisitcs in detail yet, but understood that this was a basic requirement for geostatisitical modelling procedures. http://www.users.on.net/~digbym/about_consulting.htm Digby * By using the ai-geostats mailing list you agree to follow its rules ( see http://www.ai-geostats.org/help_ai-geostats.htm ) * To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to [EMAIL PROTECTED] Signoff ai-geostats
Re: [ai-geostats] F and T-test for samples drawn from the same population
I believe a related topic is called the proportional effect, which is displayed when populations display related, but different properties, as discussed in Geostatistical Ore Reserve Estimation, M. David, pp170, which also displays itself in a sudy of the normal and relative variograms. Regards Digby * By using the ai-geostats mailing list you agree to follow its rules ( see http://www.ai-geostats.org/help_ai-geostats.htm ) * To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to [EMAIL PROTECTED] Signoff ai-geostats