RE: AI-GEOSTATS: Summary: Large sample size and normal distribution
Hi, I'm not sure i agree with the idea that a test can be too powerful. This is a common argument in simulation experiments, that because you can do an infinite number of replicate simulations, somehow the differences detected are not real. In fact, the differences are real. They may not be biologically (or geologically or whatever field you are in) significant, but they are still real. That is why it is better to decide first on the magnitude of difference that you consider significant. The null hypothesis is always false although it might be false by a very small quantity, that is the trivial fact that the very large sample size illustrates in the common test of significance. The conclusion to be drawn from this is not that we must set in advance the amount of difference that we would find significant (a rather restrictive strategy which will be violated very often because it is nonsensical), but rather that the only sensible strategy is to compare hypotheses one against another. This can be done on an evidential basis by evaluating the likelihood ratio, the likelihood of the data under one hypothesis divided by the likelihood of the data under another hypothesis. By constructing the whole likelihood function (in the case of a single parameter) any pair of hypotheses can be tested for the value of the likelihood ratio. Now, in the case of deviation from normality, I suppose you wouldn't have much intuition about what is significant, but the relevant question is what is the effect of small deviations from normality on your test or conclusions of your analysis? Perhaps a better question is what the data say about a given hypothesis for the mean versus another value for the mean assuming the normal distribution is true? If the variance is unknown there is a simple solution only for the normal and a few other cases, by orthogonalization, and then the two parameters can be assessed separately. For comparing two different models, say normal versus lognormal, a likelihood based approach, the Akaike Information Criterion, is available although i am not sure that Akaike's approach is fully in agreement with the likelihood principle. Ruben -- * To post a message to the list, send it to [EMAIL PROTECTED] * As a general service to the users, please remember to post a summary of any useful responses to your questions. * To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe ai-geostats followed by end on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list * Support to the list is provided at http://www.ai-geostats.org
Re: AI-GEOSTATS: Simulation and trends
Adrian Thank you for the reminder of one of the strengths of Turning Bands. Certainly I have no argument with your points. However Chris' question was about how to include trend in SGS and that is what my answer is about. Isobel http://ecosse.ontheweb.com Want to chat instantly with your online friends? Get the FREE Yahoo! Messenger http://uk.messenger.yahoo.com/ -- * To post a message to the list, send it to [EMAIL PROTECTED] * As a general service to the users, please remember to post a summary of any useful responses to your questions. * To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe ai-geostats followed by end on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list * Support to the list is provided at http://www.ai-geostats.org
Re: AI-GEOSTATS: plurigaussian simulations
Hi, Adrian - see text book by M. Armstrong, Galli et. al: Plurigaussian simulation in geosciences.- Springer Verlag 2003, including CD with demo-software. or Lantuejoul, C. Geostatistical simulation - models and algorithms Springer, 2002 Heinz Burger -- * Dr. Heinz Burger Freie Universitaet Berlin - Geoinformatik - Malteserstr. 74-100 12249 BERLIN, Germany Tel. (49) 30-838-70561 Fax: (49) 30-838-70723 mailto: [EMAIL PROTECTED] Web-Seite: http://userpage.fu-berlin.de/~hburger/hb -- * To post a message to the list, send it to [EMAIL PROTECTED] * As a general service to the users, please remember to post a summary of any useful responses to your questions. * To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe ai-geostats followed by end on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list * Support to the list is provided at http://www.ai-geostats.org
AI-GEOSTATS: Summary: Large sample size and normal distribution
Dear All,One week ago I posted a question about large n and normal distritbuion, and have got several good replies from Isobel Clark, Ned Levine, Ruben Roa Ureta, Thies Dose, Chris Hlavka, Donald Myers and Jeffrey Blume. Jeffrey is perhaps not in the list, but I assume he has no objections if I copy his message to the list.Generally speaking, when n is too large, e.g., n1,000 which is very common in geochemistry nowadays, statistical (goodness-of-fit) tests become too powerful, and the p-values are less informative. Therefore, users need to be very careful in using these tests with a large n. Suggestions to solve this problem include: (1) To use graphical methods; (2) To develop methods which are suitable for large n; (3) To use methods which are not sensitive to n. Well, the solutions may not be very satisfactory, but I do hope statisticians pay more attention on large n, as they have been paying too much attention on small ones. More personal discussions are welcome. If you need some data sets to play with, please feel free to get in touch with me.Please find the following original question and the replies. I would like to show my sincere thanks to all those who replied me (I hope nobody is missing in the above list).Cheers,Chaosheng--Dr. Chaosheng ZhangLecturer in GISDepartment of GeographyNational University of Ireland, GalwayIRELANDTel: +353-91-524411 x 2375Fax: +353-91-525700E-mail: [EMAIL PROTECTED]Web 1: www.nuigalway.ie/geography/zhang.htmlWeb 2: www.nuigalway.ie/geography/gis/index.htm- Original Message - Dear list, I'm wondering if anyone out there has the experience of dealing with the probability distribution of data sets of a large sample size, e.g., n10,000. I am studying the probability feature of chemical element concentrations in a USGS sediment database with the sample number of around 50,000, and have found that it is virtually impossible for any real data set to pass tests for normality as the tests become too powerful with the increase of sample size. It is widely oberved that geochemical data do not follow a normal or even a lognormal distribution. However, I feel that the large sample size is also making trouble. I am looking for references on this topic. Any references or comments are welcome. Cheers, Chaosheng---ChaoshengYour problem may be 'non-stationarity' rather than thelarge sample size. If you have so many samples, youare probably sampling more than one 'population'. We have had success in fitting lognormals to miningdata sets of up to half a million, where these are allwithin the same geological environment and primaryminerlisation.We have also had a lot of success in reasonably largedata sets (up to 100,000) with fitting mixtures oftwo, three or four lognormals (or Normals) tocharacterise different populations. See, for example,the paper given at the Australian Miing Geologyconference in 1993 on my page athttp://drisobelclark.ontheweb.com/resume/Publications.htmlIsobelhttp://ecosse.ontheweb.com--Chaosheng, Can't you do a Monte Carlo simulation for the distribution? In S-Plus, you can create confidence intervals from a MC simulation with a sample size as large as you have. That is, you draw 50,000 or so points from a normal distribution and calculate the distribution. You then re-run this a number of times (e.g., 1000) to establish approximate confidence intervals. You can then what proportion of your data points fall outside the approximate confidence intervals; you would expect no more than 5% or so of the data points to fall outside the intervals if your distribution is normal. If more than 5% fall outside, then you really don't have a normal distribution (since a normal distribution is essentially a random distribution, I would doubt that any real data set would be truly normal - the sampling distribution is another issue). Anyway, just some thoughts. Hope everything is well with you.Regards,Ned---I pressume your null hypothesis is that the data comes from the givendistribution as is usual in goodness of fit tests. If such is the caseyour sample size will almost surely lead to rejection. The well-knownlogical inconsistencies of the standard test of hypothesis based on thep-value are magnified under large n.You have these options at least:1) Find some authority that says that for large sample sizes the p-valueis less informative; e.g. Lindley and Scott. 1984. New CambridgeElementary Statistical Tables. Cambridge Univ Press; and then you canthrow away your goodness-of-fit test. But be warned that equally importantauthorities have said exactly the contrary thing, that the force of thep-value is stronger for large sample sizes (Peto et al. 1976. BritishMedical Journal 34:585-612). To make
RE: AI-GEOSTATS: Summary: Large sample size and normal distribution
Hi, I'm not sure i agree with the idea that a test can be too powerful. This is a common argument in simulation experiments, that because you can do an infinite number of replicate simulations, somehow the differences detected are not real. In fact, the differences are real. They may not be biologically (or geologically or whatever field you are in) significant, but they are still real. That is why it is better to decide first on the magnitude of difference that you consider significant. Now, in the case of deviation from normality, I suppose you wouldn't have much intuition about what is significant, but the relevant question is what is the effect of small deviations from normality on your test or conclusions of your analysis? These kinds of studies are out there in the statistical literature for many tests (T-tests etc.) --I'm not sure how much has been done to look at the robustness of geostatistical analyses, but there are probably some studies (does anyone know?) I would not opt for a less-powerful test just to justify an assumption - that's, like, unethical or something. Yetta -- * To post a message to the list, send it to [EMAIL PROTECTED] * As a general service to the users, please remember to post a summary of any useful responses to your questions. * To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe ai-geostats followed by end on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list * Support to the list is provided at http://www.ai-geostats.org
AI-GEOSTATS: 3D variogram analysis.
Dear All, I am studying on geostatistical reserve estimation and grade distribution. Ineed some information about variograms. Anisotropy axes can be found easily in 2D. But in the mining, data includes x,y,z coordinates of each datum (for example a drilling log splitted to 3 part or composited data) andthus the study requires three dimensional variogram analysis. I tried anisotropy axes in x-y plane and x or y-z (vertical)plane. But there can be an anisotropy in the direction of for example diagonal to the x-y plane. There are a huge amount of possible anisotropy directions in 3D. I read many papers on geostatistical papers but I am not satisfied. Could you please give me some information about 3D variogram analysis or related papers? Thanks in advance. Tayfun Yusuf YÜNSEL
RE: AI-GEOSTATS:
Look in any good analytical cartography textbook you can find - in discussion of symbolization, they should have the formula. -Original Message- From: tamas To: [EMAIL PROTECTED] Sent: 8/9/03 7:07 AM Subject: AI-GEOSTATS: Dear Members, Where can I find more details about the Jenk's(natural breaks) optimization this used by Arcview-ESRI to classify. Is this absolutly differ from Kolgomorov-Smir. Thanks for help. Janos Tamas University of Debrecen http//gisserver1.date.hu -- * To post a message to the list, send it to [EMAIL PROTECTED] * As a general service to the users, please remember to post a summary of any useful responses to your questions. * To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe ai-geostats followed by end on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list * Support to the list is provided at http://www.ai-geostats.org
Re: AI-GEOSTATS: LAI (leaf area index) fromNDVI
Title: Re: AI-GEOSTATS: LAI (leaf area index) fromNDVI You might want to consider the implications of using data with different supports - 3x3 neighborhoods and points. The 3x3 neighborhoods are probably larger than the areas associated with LAI field values. Thus the 3x3 mean NDVI's can be considered to be estimates of NDVI's at the points that have associated error. Error in the independent (x) variable leads to underestimated correlation and (ordinary least squares (OLS) regression) slope. There are a number of alternatives to OLS, such a MA and RMA regression, that might lead to improved slope estimates, but I prefer to correct the correlation and slope estimates using estimates of the precision of the x variable. You can roughly estimate the precision of the point NDVI by : 1) calculating the standard deviation of the nine pixel NDVI's associated with each field observation and 2) plotting the standard deviations versus the means to check for dependence of variation with magnitude, 3) estimate standard error of NDVI as the mean standard deviation if no dependence, otherwise consider regression of LAI versus log(NDVI) or log-log regression. Note that if the point area is much smaller than a pixel, the x error will be underestimated - but fixing this would involve either a geostatistical analysis of point values for NDVI or estimation involving fractal analysis. The error estimate can then be used to correct the estimates of correlation and slope (my apologies for cutting and pasting a Word file so that subsrcipts and superscripts are lost, and maybe also Greek font, and illustrating with S commands): Preliminaries: First, consider regression of variable x (the independent variable) versus y (the dependent variable). The usual formula for the slope is: S [(xi -mx)*(yi - my)]/S (xi - mx)2 (1) where summation is over the index i for individual data points, and the means are mx and my. This formula (section 1.2 in N. Draper and H. Smith, Applied Regression Analysis, John Wiley Sons, Inc., New York, 1966) is correct, and computationally simple and accurate, that is, works well to preserve floating point accuracy. However, formulae involving descriptive statistics (correlation or covariance of x and y, and the standard devations of x and y) convey more information about the factors related to the slope: cor(x,y)*sy/sx or cov(x,y)/sx2 (2) where one can see that the magnitude of the slope increases with the correlation and range of the dependent variable y (as measured by the standard deviation), and decreases with range of the independent variable. If one of the formulae in (2) is used with n data points, it will be accurate (unbiased) if multiplied by the square root of (n-2)/(n-1) to correct for the effect of using estimated, rather than true, means and if the usual assumptions, including accurate values for the indpendent variable, are correct. If the range of the independent variable is inflated by errors, the slope will decrease, that is. will be biased low. Predicting the slope when precise values of independent variable variable x are replaced by the estimated or measured values variable x, following Section 29.56 in M. Kendall and A. Stuart, The Advanced Theory of Statistics: Volume 2: Inference and Relationship, 4th Edition, Charles Griffin Company Limited, London, 1979 (copy in your mailbox). Let's assume that the measurements are made without bias and with a precision represented as a standard deviation in error: the observed measurements (x1,x2, ) of the dependent variable x can be considered as sums of the true values (x1, x2, ) with 0 standard error plus errors (d1,d2, .) with average of 0 and standard deviation sd. The least squares regression slope is cov(x,y)/sx2 = cov(x,y)/( sx2 + sd2), where cov(x,y) is the covariance between x and y, i.e. the correlation times the product of the standard deviations of x and y. Now if the least squares slope with no errors is cov(x,y)/ sx2 = 1, then the slope with the errors is: cov(x,y)/( sx2 + sd2) = (cov(x,y)/sx2) * [sx2/ (sx2 + sd2)] = sx2/( sx2 + sd2) = 1/[1 + (sd/sx)2] (3) The _expression_ on the right is a function of the relative magnitude sd/sx of the measurement error to data range for the independent variable, where standard deviation is the metric. The range term sx can be approximated with sx, the standard deviation of the measured values, if the range of measurements is large compared to the measurement errors. Otherwise, correct for the effect of measurement error by using , leading (as you have noted) to (sx2 - sd2)/sx2 as the predicted slope. The estimate for sd is generally known from an independent source, such as instrument specs or calibration analysis. You will note that the slope predicted with (1) is always less than 1. The mathematical cause is due to the inflation of the denominator from sx2 to sx2 + sd2. Perhaps what is counter-intuitive is that the slope is biased due to mean zero
AI-GEOSTATS: Call for papers Int Colloquium on LUCC and Env Problems 19-21 December 2003
FYI Dr Ernan Rustiadi Laboratory of Land Resources Development Planning Department of Soil Sciences, Faculty of Agriculture Bogor Agricultural University (IPB) Jl. Meranti, Darmaga Campus of IPB. Bogor, INDONESIA Tel./Fax: +62-251-422322 email: [EMAIL PROTECTED], [EMAIL PROTECTED] http://www.hdp-ina.org, http://www.hdp-ina.net, http://www.rawg.org Call for Papers LUCC and Environmental Problems International Colloquium, 19-21 December 2003 Increasing of uncontrolled exploitation of natural resources is currently attaining an alarming level. The environmental problems created by such exploitation can have a disastrous consequences magnitude. The integrity and conservation of the environment coupled with sustainable development is very important for the progress of the nation and industrial growth. Understanding the dynamics of land use and land cover change processes plays an important role to improve the understanding of the dynamics interactions between human activities and natural resources utilization. The two days Colloquium 2003 of land use and land cover change, will bring together researchers, academicians, managers and planners as well as critics from different universities, industries, organizations and agencies from all over the world to participate in the on-going discussions and present their findings and opinions for the promotion of broad knowledge of natural, environmental systems, LUCC, and its issues. OBJECTIVES The main purpose of this Colloquium are: To present research findings related in land use and land cover change To discuss and evaluate the latest methods and issues of land use and land cover change analysis To promote co-operation and networking among scientist, experts, practitioners and researchers involved in addressing latest land use and land cover change SESSIONS The scope of this Colloquium covers relationship between land use and land cover change and: Urbanization Deforestation Food Problems Water Problems VENUE The location of this colloquium is at Hotel Salak and conference centre of Bogor Agriculture University, situated in a peaceful setting near Bogor Botanical Garden in the Southern part of Jakarta. The colloquium is open to 70 active participants. Participation is charged. Potential participants should refer to the registration form. ABSTRACTContributors submit a title and 150 words abstract your paper by September,30 2003 by e-mail to: [EMAIL PROTECTED], [EMAIL PROTECTED], or [EMAIL PROTECTED]. or visit http://www.hdp-ina.org, http://www.hdp-ina.net, http://www.raug.org Selected abstracts will be informed at October 20, 2003. Submission of full papers should be received before November 30, 2003. Papers presented at the colloquium will be considered for publication in a book. GUIDANCE FOR AUTHORS Papers should be written in English on A4 paper size with Times New Roman font (12 pt), 2 column (except abstracts) Papers content are title, authors and his or her e-mail address, background, methods, discussion, conclusion and references. Abstract could be sent by e-mail by attachment or mail to Organizing Committee. REGISTRATION Participant could enroll by fax, e-mail or mail this complete filled form to Organizing Committee.
Re: AI-GEOSTATS: Summary: Large sample size and normal distribution
Dear Yetta, Thanks for the comments, and I agree with you. I think there is a function between sample size and statistical power. The power increases with the increase of n. It's true that it is hard to define how powerful is "too powerful". Some people suggest to use a lower significance level for large n. However, it is also a problem that how low (e.g., 0.001) is low enough? Some people suggest not to use the p-valueas mentioned in the summary. It is also a question how serious it may be if the data set does not follow a normal distribution. Statisticians may provide us some artificial examples showinghow serious it is, but this may not be so serious in the real world if it's only a minor departure. Some people even say that statistical methods can not be used because our samples are not independent at all because of spatial autocorrelation. Well, perhaps I have gone too far, but it is an interesting topic. (Geo)statisticians may have better comments. By the way, I may not summarize again. If anyone would like to share your ideas with the list, please copy to it. Cheers, Chaosheng - Original Message - From: "zij" [EMAIL PROTECTED] To: "ai-geostats" [EMAIL PROTECTED]; "Chaosheng Zhang" [EMAIL PROTECTED] Sent: Monday, August 11, 2003 7:13 PM Subject: RE: AI-GEOSTATS: Summary: Large sample size and normal distribution Hi, I'm not sure i agree with the idea that a test can be too powerful. This is a common argument in simulation experiments, that because you can do an infinite number of replicate simulations, somehow the differences detected are not real. In fact, the differences are real. They may not be biologically (or geologically or whatever field you are in) significant, but they are still real. That is why it is better to decide first on the magnitude of difference that you consider significant. Now, in the case of deviation from normality, I suppose you wouldn't have much intuition about what is significant, but the relevant question is what is the effect of small deviations from normality on your test or conclusions of your analysis? These kinds of studies are out there in the statistical literature for many tests (T-tests etc.) --I'm not sure how much has been done to look at the robustness of geostatistical analyses, but there are probably some studies (does anyone know?) I would not opt for a less-powerful test just to justify an assumption - that's, like, unethical or something. Yetta
Re: AI-GEOSTATS: programming ArcView GIS
Hi Ellen every program written for ArcView use the Avenue language (there is a lot of books from esri regarding avenue programming). However, ArcGis 8.x uses ArcObject (practically Visual Basic). the same esri has published a book for translating avenue scripts into arcobject scripts. you should also consider that many .apr scripts have a protection (for copiright questions) and thus you cannot read directly the code. Luigi Maiorano PhD Student College of Natural Resources University of Idaho, Moscow 83843, Idaho (USA) [EMAIL PROTECTED] - Original Message - From: Ellen De Beuckeleer [EMAIL PROTECTED] Date: Thursday, August 7, 2003 2:57 pm Subject: AI-GEOSTATS: programming ArcView GIS Dear List-members, How can I program applications for ArcView GIS? The book Statistical Analysis with ArcView GIS, by Jay Lee and David Wong comes with some example scripts, which have file extension .apr. Unfortunately these files only work with 3.x version of ArcView. I am using version 8 and for my PHD I would like to learn how to program applicationsfor ArcView GIS, especially the Moran I Index. In which language are .apr files constructed? Are there any good books concerning this issue? Where do I start. Greets, Ellen -- * To post a message to the list, send it to [EMAIL PROTECTED] * As a general service to the users, please remember to post a summary of any useful responses to your questions. * To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe ai-geostats followed by end on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list * Support to the list is provided at http://www.ai-geostats.org
AI-GEOSTATS: Cluster Analysis
I have a point data set (~150k records) that contains a ratio variable (percentages, both positive and negative) that I would like to run a local cluster analysis on. Global Moran's I and Geary's C indicate clustering as a whole, I just need to get a statistical measure of where. If I understand correctly (a big if at this point), a local moran's must be aggregated by region or represent continuous data, which for various reasons I would like to stay away from. If I can correctly define a lag distance, would a Local G statistic be better? What other tests may appropriate for local cluster analysis of discrete point data without having to use predefined regions or aggregate the data? Wes Highfield Graduate Research Assistant Department of Landscape Architecture and Urban Planning Texas AM University College Station, TX 77843-3137