RE: AI-GEOSTATS: Summary: Large sample size and normal distribution

2003-08-14 Thread Ruben Roa Ureta
 Hi,

 I'm not sure i agree with the idea that a test can be too powerful.  This
is a common argument in simulation experiments, that because you can do an
infinite number of replicate simulations, somehow the differences 
detected are not real.  In fact, the differences are real.  They may not
be biologically (or geologically or whatever field you are in)
significant, but they are still real.  That is why it is better to decide
first on the magnitude of difference that you consider significant.

The null hypothesis is always false although it might be false by a very
small quantity, that is the trivial fact that the very large sample size
illustrates in the common test of significance. The conclusion to be drawn
from this is not that we must set in advance the amount of difference that
we would find significant (a rather restrictive strategy which will be
violated very often because it is nonsensical), but rather that the only
sensible strategy is to compare hypotheses one against another. This can
be done on an evidential basis by evaluating the likelihood ratio, the
likelihood of the data under one hypothesis divided by the likelihood of
the data under another hypothesis. By constructing the whole likelihood
function (in the case of a single parameter) any pair of hypotheses can be
tested for the value of the likelihood ratio.

 Now, in the case of deviation from normality, I suppose you wouldn't
have much intuition about what is significant, but the relevant question
is what is the effect of small deviations from normality on your test or
conclusions of your analysis?

Perhaps a better question is what the data say about a given hypothesis
for the mean versus another value for the mean assuming the normal
distribution is true? If the variance is unknown there is a simple
solution only for the normal and a few other cases, by orthogonalization,
and then the two parameters can be assessed separately. For comparing two
different models, say normal versus lognormal, a likelihood based
approach, the Akaike Information Criterion, is available although i am not
sure that Akaike's approach is fully in agreement with the likelihood
principle.

Ruben

--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org


Re: AI-GEOSTATS: Simulation and trends

2003-08-14 Thread Isobel Clark
Adrian

Thank you for the reminder of one of the strengths of
Turning Bands. Certainly I have no argument with your
points. However Chris' question was about how to
include trend in SGS and that is what my answer is
about.

Isobel
http://ecosse.ontheweb.com


Want to chat instantly with your online friends?  Get the FREE Yahoo!
Messenger http://uk.messenger.yahoo.com/

--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org


Re: AI-GEOSTATS: plurigaussian simulations

2003-08-14 Thread Heinz Burger
Hi, Adrian -

see text book by 

M. Armstrong, Galli et. al: Plurigaussian simulation in geosciences.-
Springer Verlag 2003, including CD with demo-software.
or 
Lantuejoul, C. Geostatistical simulation - models and algorithms
Springer, 2002

Heinz Burger
-- 
*
Dr. Heinz Burger
Freie Universitaet Berlin
- Geoinformatik -
Malteserstr. 74-100
12249 BERLIN, Germany
Tel. (49) 30-838-70561 Fax: (49) 30-838-70723
mailto: [EMAIL PROTECTED]
Web-Seite: http://userpage.fu-berlin.de/~hburger/hb


--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org


AI-GEOSTATS: Summary: Large sample size and normal distribution

2003-08-14 Thread Chaosheng Zhang


Dear All,One 
week ago I posted a question about large n and normal distritbuion, and have got 
several good replies from Isobel Clark, Ned Levine, Ruben Roa Ureta, Thies Dose, 
Chris Hlavka, Donald Myers and Jeffrey Blume. Jeffrey is perhaps not in the 
list, but I assume he has no objections if I copy his message to the 
list.Generally speaking, when n is too large, e.g., n1,000 which is 
very common in geochemistry nowadays, statistical (goodness-of-fit) tests become 
too powerful, and the p-values are less informative. Therefore, users need to be 
very careful in using these tests with a large n. Suggestions to solve this 
problem include: (1) To use graphical methods; (2) To develop methods which are 
suitable for large n; (3) To use methods which are not sensitive to n. 
Well, the solutions may not be very satisfactory, but I do hope 
statisticians pay more attention on large n, as they have been paying too much 
attention on small ones. More personal discussions are welcome. If you need some 
data sets to play with, please feel free to get in touch with me.Please 
find the following original question and the replies. I would like to show my 
sincere thanks to all those who replied me (I hope nobody is missing in the 
above 
list).Cheers,Chaosheng--Dr. 
Chaosheng ZhangLecturer in GISDepartment of GeographyNational 
University of Ireland, GalwayIRELANDTel: +353-91-524411 x 2375Fax: 
+353-91-525700E-mail: [EMAIL PROTECTED]Web 1: 
www.nuigalway.ie/geography/zhang.htmlWeb 2: 
www.nuigalway.ie/geography/gis/index.htm- 
Original Message -  Dear list,  I'm wondering if 
anyone out there has the experience of dealing with the probability 
distribution of data sets of a large sample size, e.g., n10,000. I 
am studying the probability feature of chemical element concentrations 
in a USGS sediment database with the sample number of around 50,000, and 
have found that it is virtually impossible for any real data set to pass 
tests for normality as the tests become too powerful with the increase 
of sample size. It is widely oberved that geochemical data do not follow 
a normal or even a lognormal distribution. However, I feel that the 
large sample size is also making trouble.  I am looking for 
references on this topic. Any references or comments are 
welcome.  Cheers,  
Chaosheng---ChaoshengYour problem may be 
'non-stationarity' rather than thelarge sample size. If you have so many 
samples, youare probably sampling more than one 'population'. We 
have had success in fitting lognormals to miningdata sets of up to half a 
million, where these are allwithin the same geological environment and 
primaryminerlisation.We have also had a lot of success in reasonably 
largedata sets (up to 100,000) with fitting mixtures oftwo, three or 
four lognormals (or Normals) tocharacterise different populations. See, for 
example,the paper given at the Australian Miing Geologyconference in 
1993 on my page 
athttp://drisobelclark.ontheweb.com/resume/Publications.htmlIsobelhttp://ecosse.ontheweb.com--Chaosheng, 
Can't you do a Monte Carlo simulation for the distribution? In S-Plus, you 
can create confidence intervals from a MC simulation with a sample size as large 
as you have. That is, you draw 50,000 or so points from a normal 
distribution and calculate the distribution. You then re-run this a number 
of times (e.g., 1000) to establish approximate confidence intervals. You 
can then what proportion of your data points fall outside the approximate 
confidence intervals; you would expect no more than 5% or so of the data points 
to fall outside the intervals if your distribution is normal. If more than 
5% fall outside, then you really don't have a normal distribution (since a 
normal distribution is essentially a random distribution, I would doubt that any 
real data set would be truly normal - the sampling distribution is another 
issue).  Anyway, just some 
thoughts. Hope everything is well with 
you.Regards,Ned---I pressume your null 
hypothesis is that the data comes from the givendistribution as is usual in 
goodness of fit tests. If such is the caseyour sample size will almost 
surely lead to rejection. The well-knownlogical inconsistencies of the 
standard test of hypothesis based on thep-value are magnified under large 
n.You have these options at least:1) Find some authority that says that 
for large sample sizes the p-valueis less informative; e.g. Lindley and 
Scott. 1984. New CambridgeElementary Statistical Tables. Cambridge Univ 
Press; and then you canthrow away your goodness-of-fit test. But be warned 
that equally importantauthorities have said exactly the contrary thing, that 
the force of thep-value is stronger for large sample sizes (Peto et al. 
1976. BritishMedical Journal 34:585-612). To make 

RE: AI-GEOSTATS: Summary: Large sample size and normal distribution

2003-08-14 Thread zij
Hi,

I'm not sure i agree with the idea that a test can be too powerful.  This is a 
common argument in simulation experiments, that because you can do an infinite 
number of replicate simulations, somehow the differences detected are not 
real.  In fact, the differences are real.  They may not be biologically (or 
geologically or whatever field you are in) significant, but they are still 
real.  That is why it is better to decide first on the magnitude of difference 
that you consider significant.  Now, in the case of deviation from normality, 
I suppose you wouldn't have much intuition about what is significant, but the 
relevant question is what is the effect of small deviations from normality on 
your test or conclusions of your analysis?  These kinds of studies are out 
there in the statistical literature for many tests (T-tests etc.) --I'm not 
sure how much has been done to look at the robustness of geostatistical 
analyses, but there are probably some studies (does anyone know?) I would not 
opt for a less-powerful test just to justify an assumption - that's, like, 
unethical or something.

Yetta



--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org


AI-GEOSTATS: 3D variogram analysis.

2003-08-14 Thread Tayfun Yusuf YÜNSEL



Dear All,

I am studying on geostatistical reserve estimation 
and grade distribution. Ineed some information about variograms. 
Anisotropy axes can be found easily in 2D. But in the mining, data includes 
x,y,z coordinates of each datum (for example a drilling log splitted to 3 part 
or composited data) andthus the study 
requires three dimensional variogram analysis. I tried anisotropy axes in x-y 
plane and x or y-z (vertical)plane. But there can be an anisotropy in the 
direction of for example diagonal to the x-y plane. There are a huge amount of 
possible anisotropy directions in 3D. I read many papers on geostatistical 
papers but I am not satisfied. 

Could you please give me some information about 3D 
variogram analysis or related papers?

Thanks in advance.

Tayfun Yusuf YÜNSEL











RE: AI-GEOSTATS:

2003-08-14 Thread Munroe, Darla K
Look in any good analytical cartography textbook you can find - in
discussion of symbolization, they should have the formula.  
-Original Message-
From: tamas
To: [EMAIL PROTECTED]
Sent: 8/9/03 7:07 AM
Subject: AI-GEOSTATS: 

Dear Members,
Where can I find more details about the Jenk's(natural breaks)
optimization  this used by Arcview-ESRI  to classify. Is this absolutly
differ from Kolgomorov-Smir.
Thanks for help.
Janos Tamas
University of Debrecen
http//gisserver1.date.hu

--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org


Re: AI-GEOSTATS: LAI (leaf area index) fromNDVI

2003-08-14 Thread Chris Hlavka
Title: Re: AI-GEOSTATS: LAI (leaf area index)
fromNDVI


You might want to consider the implications of using data with
different supports -
3x3 neighborhoods and points. The 3x3 neighborhoods are
probably larger than the
areas associated with LAI field values. Thus the 3x3 mean
NDVI's can be considered
to be estimates of NDVI's at the points that have associated
error. Error in the independent
(x) variable leads to underestimated correlation and (ordinary
least squares (OLS) regression) slope.
There are a number of alternatives to OLS, such a MA and RMA
regression, that might lead to
improved slope estimates, but I prefer to correct the
correlation and slope estimates using
estimates of the precision of the x variable. 

You can roughly estimate the precision of the point NDVI
by : 1) calculating the standard deviation of the nine pixel
NDVI's associated with each field observation and 2) plotting the
standard deviations versus the means to check for dependence of
variation with magnitude,
3) estimate standard error of NDVI as the mean standard
deviation if no dependence, otherwise consider
regression of LAI versus log(NDVI) or log-log regression.
Note that if the point area is much smaller than a pixel,
the x error
will be underestimated - but fixing this would involve either a
geostatistical analysis of point values for NDVI
or estimation involving fractal analysis.

The error estimate can then be used to correct the estimates of
correlation and slope (my apologies
for cutting and pasting a Word file so that subsrcipts and
superscripts are lost, and maybe also Greek font, and illustrating
with S commands):

Preliminaries: First,
consider regression of variable x (the independent variable) versus y
(the dependent variable). The usual formula for the slope
is:

 S [(xi -mx)*(yi - my)]/S (xi - mx)2







(1)

where summation is over the index i for individual data points, and
the means are mx and my. This formula (section 1.2 in N. Draper
and H. Smith, Applied Regression Analysis, John Wiley 
Sons, Inc., New York, 1966) is correct, and computationally simple
and accurate, that is, works well to preserve floating point
accuracy. However, formulae involving descriptive statistics
(correlation or covariance of x and y, and the standard devations of
x and y) convey more information about the factors related to the
slope:

 cor(x,y)*sy/sx
or cov(x,y)/sx2






(2)

where one can see that the magnitude of the slope increases with the
correlation and range of the dependent variable y (as measured by the
standard deviation), and decreases with range of the independent
variable. If one of the formulae in (2) is used with n data
points, it will be accurate (unbiased) if multiplied by the square
root of (n-2)/(n-1) to correct
for the effect of using estimated, rather than true,
means and if the usual assumptions, including accurate values for the
indpendent variable, are correct. If the range of the independent
variable is inflated by errors, the slope will decrease, that is.
will be biased low.

Predicting the slope when precise values of independent
variable variable x are replaced by the estimated or measured values
variable x, following Section 29.56 in M.
Kendall and A. Stuart, The Advanced Theory of Statistics: Volume
2: Inference and Relationship, 4th Edition, Charles Griffin 
Company Limited, London, 1979 (copy in your mailbox). Let's
assume that the measurements are made without bias and with a
precision represented as a standard deviation in error: the observed
measurements (x1,x2, Š) of
the dependent variable x can be considered
as sums of the true values (x1, x2, Š ) with 0
standard error plus errors (d1,d2, Š.) with average of 0 and standard
deviation sd. The least squares
regression slope is cov(x,y)/sx2 = cov(x,y)/( sx2
+ sd2), where cov(x,y) is the covariance
between x and y, i.e. the correlation times the product of the
standard deviations of x and y. Now if the least squares slope
with no errors is cov(x,y)/ sx2 = 1, then
the slope with the errors is:

 cov(x,y)/( sx2
+ sd2) = (cov(x,y)/sx2) * [sx2/ (sx2 + sd2)] = sx2/( sx2 + sd2)




 =
1/[1 + (sd/sx)2]




 
(3)

The _expression_ on the right is a function of the relative
magnitude sd/sx
of the measurement error to data range for the independent variable,
where standard deviation is the metric. The range term sx can be approximated with
sx, the standard deviation of the measured values, if the
range of measurements is large compared to the measurement
errors. Otherwise, correct for the effect of measurement error
by using , leading (as you have noted) to (sx2 -
sd2)/sx2 as the
predicted slope. The estimate for sd
is generally known from an independent source, such as instrument
specs or calibration analysis.

You will note that the slope predicted with (1) is always less than
1. The mathematical cause is due to the inflation of the
denominator from sx2 to sx2 + sd2. Perhaps what is counter-intuitive
is that the slope is biased due to mean zero 

AI-GEOSTATS: Call for papers Int Colloquium on LUCC and Env Problems 19-21 December 2003

2003-08-14 Thread Ernan R



FYI


Dr Ernan Rustiadi
Laboratory of Land Resources Development 
Planning
Department of Soil Sciences, Faculty of 
Agriculture
Bogor Agricultural University (IPB)
Jl. Meranti, Darmaga Campus of IPB.
Bogor, INDONESIA
Tel./Fax: +62-251-422322
email: [EMAIL PROTECTED], [EMAIL PROTECTED]
http://www.hdp-ina.org,
http://www.hdp-ina.net,
http://www.rawg.org


Call for Papers

LUCC and Environmental Problems 
International Colloquium, 19-21 December 2003

Increasing of uncontrolled exploitation of natural resources is 
currently attaining an alarming level. The environmental problems created by 
such exploitation can have a disastrous consequences magnitude. The integrity 
and conservation of the environment coupled with sustainable development is very 
important for the progress of the nation and industrial growth. 
Understanding the dynamics of land use and land cover change processes plays 
an important role to improve the understanding of the dynamics interactions 
between human activities and natural resources utilization. The two days 
Colloquium 2003 of land use and land cover change, will bring together 
researchers, academicians, managers and planners as well as critics from 
different universities, industries, organizations and agencies from all over the 
world to participate in the on-going discussions and present their findings and 
opinions for the promotion of broad knowledge of natural, environmental systems, 
LUCC, and its issues.  OBJECTIVES The main purpose of this 
Colloquium are: To present research findings related in land use and land 
cover change To discuss and evaluate the latest methods and issues of land 
use and land cover change analysis To promote co-operation and networking 
among scientist, experts, practitioners and researchers involved in addressing 
latest land use and land cover change  SESSIONS The scope of 
this Colloquium covers relationship between land use and land cover change and: 
Urbanization Deforestation Food Problems Water Problems 
 VENUE The location of this colloquium is at Hotel Salak and 
conference centre of Bogor Agriculture University, situated in a peaceful 
setting near Bogor Botanical Garden in the Southern part of Jakarta. The 
colloquium is open to 70 active participants. Participation is charged. 
Potential participants should refer to the registration form.  
ABSTRACTContributors submit a title and 150 words abstract your paper 
by September,30 2003 by e-mail to: [EMAIL PROTECTED], [EMAIL PROTECTED], or [EMAIL PROTECTED]. or visit 
http://www.hdp-ina.org,
http://www.hdp-ina.net,
http://www.raug.org
Selected abstracts will be informed at October 20, 2003. Submission of 
full papers should be received before November 30, 2003. Papers presented at 
the colloquium will be considered for publication in a book. 

GUIDANCE FOR AUTHORS Papers should be written in English on A4 paper 
size with Times New Roman font (12 pt), 2 column (except abstracts) Papers 
content are title, authors and his or her e-mail address, background, methods, 
discussion, conclusion and references. Abstract could be sent by e-mail by 
attachment or mail to Organizing Committee.  REGISTRATION 
Participant could enroll by fax, e-mail or mail this complete filled form to 
Organizing Committee. 



 



Re: AI-GEOSTATS: Summary: Large sample size and normal distribution

2003-08-14 Thread Chaosheng Zhang



Dear Yetta,

Thanks for the comments, and I agree with you. I 
think there is a function between sample size and statistical power. The power 
increases with the increase of n. It's true that it is hard to define how 
powerful is "too powerful". Some people suggest to use a lower significance 
level for large n. However, it is also a problem that how low (e.g., 0.001) 
is low enough? Some people suggest not to use the p-valueas mentioned in 
the summary. 

It is also a 
question how serious it may be if the data set does not follow a normal 
distribution. Statisticians may provide us some artificial examples 
showinghow serious it is, but this may not be so serious in the real world 
if it's only a minor departure. Some people even say that statistical methods 
can not be used because our samples are not independent at all because of 
spatial autocorrelation. Well, perhaps I have gone too far, but it is an 
interesting topic. (Geo)statisticians may have better comments.

By the way, I may not summarize again. If anyone 
would like to share your ideas with the list, please copy to it.

Cheers,

Chaosheng


- Original Message - 
From: "zij" [EMAIL PROTECTED]
To: "ai-geostats" [EMAIL PROTECTED]; "Chaosheng 
Zhang" [EMAIL PROTECTED]
Sent: Monday, August 11, 2003 7:13 PM
Subject: RE: AI-GEOSTATS: Summary: Large sample 
size and normal distribution
 Hi,  I'm not sure i agree with the idea that a test 
can be too powerful. This is a  common argument in simulation 
experiments, that because you can do an infinite  number of replicate 
simulations, somehow the differences detected are not  real. In 
fact, the differences are real. They may not be biologically (or  
geologically or whatever field you are in) significant, but they are still 
 real. That is why it is better to decide first on the magnitude 
of difference  that you consider significant. Now, in the case of 
deviation from normality,  I suppose you wouldn't have much intuition 
about what is significant, but the  relevant question is what is the 
effect of small deviations from normality on  your test or conclusions 
of your analysis? These kinds of studies are out  there in the 
statistical literature for many tests (T-tests etc.) --I'm not  sure how 
much has been done to look at the robustness of geostatistical  
analyses, but there are probably some studies (does anyone know?) I would not 
 opt for a less-powerful test just to justify an assumption - that's, 
like,  unethical or something.  Yetta  
 


Re: AI-GEOSTATS: programming ArcView GIS

2003-08-14 Thread Luigi Maiorano
Hi Ellen

every program written for ArcView use the Avenue language (there is a lot of books 
from esri regarding avenue programming). However, ArcGis 8.x uses ArcObject 
(practically Visual Basic). the same esri has published a book for translating avenue 
scripts into arcobject scripts. you should also consider that many .apr scripts have a 
protection (for copiright questions) and thus you cannot read directly the code.

Luigi Maiorano
PhD Student
College of Natural Resources
University of Idaho, Moscow
83843, Idaho (USA)
[EMAIL PROTECTED]

- Original Message -
From: Ellen De Beuckeleer [EMAIL PROTECTED]
Date: Thursday, August 7, 2003 2:57 pm
Subject: AI-GEOSTATS: programming ArcView GIS

 Dear List-members,
 
 How can I program applications for ArcView GIS?
 
 The book Statistical Analysis with ArcView GIS, by Jay Lee and 
 David Wong
 comes with some example scripts, which have file extension .apr.
 Unfortunately these files only work with 3.x version of ArcView. I 
 am using
 version 8 and for my PHD I would like to learn how to program 
 applicationsfor ArcView GIS, especially the Moran I Index.
 
 In which language are .apr files constructed? Are there any good books
 concerning this issue? Where do I start.
 
 Greets,
 
 Ellen
 


--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org


AI-GEOSTATS: Cluster Analysis

2003-08-14 Thread Wes Highfield








I have a point data set
(~150k records) that contains a ratio variable (percentages, both positive and
negative) that I would like to run a local cluster analysis on. Global Moran's
I and Geary's C indicate clustering as a whole, I just need to get a
statistical measure of where. If I understand correctly (a big if
at this point), a local moran's must be aggregated by region or represent
continuous data, which for various reasons I would like to stay away from. If
I can correctly define a lag distance, would a Local G statistic be better?
What other tests may appropriate for local cluster analysis of discrete point
data without having to use predefined regions or aggregate the data?





Wes Highfield

Graduate Research Assistant

Department of Landscape Architecture and Urban Planning

Texas AM University

College Station, TX 77843-3137