date:20030814

RE: AI-GEOSTATS: Summary: Large sample size and normal distribution

2003-08-14 Thread Ruben Roa Ureta

 Hi,

 I'm not sure i agree with the idea that a test can be too powerful.  This
is a common argument in simulation experiments, that because you can do an
infinite number of replicate simulations, somehow the differences 
detected are not real.  In fact, the differences are real.  They may not
be biologically (or geologically or whatever field you are in)
significant, but they are still real.  That is why it is better to decide
first on the magnitude of difference that you consider significant.

The null hypothesis is always false although it might be false by a very
small quantity, that is the trivial fact that the very large sample size
illustrates in the common test of significance. The conclusion to be drawn
from this is not that we must set in advance the amount of difference that
we would find significant (a rather restrictive strategy which will be
violated very often because it is nonsensical), but rather that the only
sensible strategy is to compare hypotheses one against another. This can
be done on an evidential basis by evaluating the likelihood ratio, the
likelihood of the data under one hypothesis divided by the likelihood of
the data under another hypothesis. By constructing the whole likelihood
function (in the case of a single parameter) any pair of hypotheses can be
tested for the value of the likelihood ratio.

 Now, in the case of deviation from normality, I suppose you wouldn't
have much intuition about what is significant, but the relevant question
is what is the effect of small deviations from normality on your test or
conclusions of your analysis?

Perhaps a better question is what the data say about a given hypothesis
for the mean versus another value for the mean assuming the normal
distribution is true? If the variance is unknown there is a simple
solution only for the normal and a few other cases, by orthogonalization,
and then the two parameters can be assessed separately. For comparing two
different models, say normal versus lognormal, a likelihood based
approach, the Akaike Information Criterion, is available although i am not
sure that Akaike's approach is fully in agreement with the likelihood
principle.

Ruben

--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org

Re: AI-GEOSTATS: Simulation and trends

2003-08-14 Thread Isobel Clark

Adrian

Thank you for the reminder of one of the strengths of
Turning Bands. Certainly I have no argument with your
points. However Chris' question was about how to
include trend in SGS and that is what my answer is
about.

Isobel
http://ecosse.ontheweb.com


Want to chat instantly with your online friends?  Get the FREE Yahoo!
Messenger http://uk.messenger.yahoo.com/

--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org

Re: AI-GEOSTATS: plurigaussian simulations

2003-08-14 Thread Heinz Burger

Hi, Adrian -

see text book by 

M. Armstrong, Galli et. al: Plurigaussian simulation in geosciences.-
Springer Verlag 2003, including CD with demo-software.
or 
Lantuejoul, C. Geostatistical simulation - models and algorithms
Springer, 2002

Heinz Burger
-- 
*
Dr. Heinz Burger
Freie Universitaet Berlin
- Geoinformatik -
Malteserstr. 74-100
12249 BERLIN, Germany
Tel. (49) 30-838-70561 Fax: (49) 30-838-70723
mailto: [EMAIL PROTECTED]
Web-Seite: http://userpage.fu-berlin.de/~hburger/hb


--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org

AI-GEOSTATS: Summary: Large sample size and normal distribution

2003-08-14 Thread Chaosheng Zhang

Dear All,One
week ago I posted a question about large n and normal distritbuion, and have got
several good replies from Isobel Clark, Ned Levine, Ruben Roa Ureta, Thies Dose,
Chris Hlavka, Donald Myers and Jeffrey Blume. Jeffrey is perhaps not in the
list, but I assume he has no objections if I copy his message to the
list.Generally speaking, when n is too large, e.g., n1,000 which is
very common in geochemistry nowadays, statistical (goodness-of-fit) tests become
too powerful, and the p-values are less informative. Therefore, users need to be
very careful in using these tests with a large n. Suggestions to solve this
problem include: (1) To use graphical methods; (2) To develop methods which are
suitable for large n; (3) To use methods which are not sensitive to n.
Well, the solutions may not be very satisfactory, but I do hope
statisticians pay more attention on large n, as they have been paying too much
attention on small ones. More personal discussions are welcome. If you need some
data sets to play with, please feel free to get in touch with me.Please
find the following original question and the replies. I would like to show my
sincere thanks to all those who replied me (I hope nobody is missing in the
above
list).Cheers,Chaosheng--Dr.
Chaosheng ZhangLecturer in GISDepartment of GeographyNational
University of Ireland, GalwayIRELANDTel: +353-91-524411 x 2375Fax:
+353-91-525700E-mail: [EMAIL PROTECTED]Web 1:
www.nuigalway.ie/geography/zhang.htmlWeb 2:
www.nuigalway.ie/geography/gis/index.htm-
Original Message - Dear list, I'm wondering if
anyone out there has the experience of dealing with the probability
distribution of data sets of a large sample size, e.g., n10,000. I
am studying the probability feature of chemical element concentrations
in a USGS sediment database with the sample number of around 50,000, and
have found that it is virtually impossible for any real data set to pass
tests for normality as the tests become too powerful with the increase
of sample size. It is widely oberved that geochemical data do not follow
a normal or even a lognormal distribution. However, I feel that the
large sample size is also making trouble. I am looking for
references on this topic. Any references or comments are
welcome. Cheers,
Chaosheng---ChaoshengYour problem may be
'non-stationarity' rather than thelarge sample size. If you have so many
samples, youare probably sampling more than one 'population'. We
have had success in fitting lognormals to miningdata sets of up to half a
million, where these are allwithin the same geological environment and
primaryminerlisation.We have also had a lot of success in reasonably
largedata sets (up to 100,000) with fitting mixtures oftwo, three or
four lognormals (or Normals) tocharacterise different populations. See, for
example,the paper given at the Australian Miing Geologyconference in
1993 on my page
athttp://drisobelclark.ontheweb.com/resume/Publications.htmlIsobelhttp://ecosse.ontheweb.com--Chaosheng,
Can't you do a Monte Carlo simulation for the distribution? In S-Plus, you
can create confidence intervals from a MC simulation with a sample size as large
as you have. That is, you draw 50,000 or so points from a normal
distribution and calculate the distribution. You then re-run this a number
of times (e.g., 1000) to establish approximate confidence intervals. You
can then what proportion of your data points fall outside the approximate
confidence intervals; you would expect no more than 5% or so of the data points
to fall outside the intervals if your distribution is normal. If more than
5% fall outside, then you really don't have a normal distribution (since a
normal distribution is essentially a random distribution, I would doubt that any
real data set would be truly normal - the sampling distribution is another
issue). Anyway, just some
thoughts. Hope everything is well with
you.Regards,Ned---I pressume your null
hypothesis is that the data comes from the givendistribution as is usual in
goodness of fit tests. If such is the caseyour sample size will almost
surely lead to rejection. The well-knownlogical inconsistencies of the
standard test of hypothesis based on thep-value are magnified under large
n.You have these options at least:1) Find some authority that says that
for large sample sizes the p-valueis less informative; e.g. Lindley and
Scott. 1984. New CambridgeElementary Statistical Tables. Cambridge Univ
Press; and then you canthrow away your goodness-of-fit test. But be warned
that equally importantauthorities have said exactly the contrary thing, that
the force of thep-value is stronger for large sample sizes (Peto et al.
1976. BritishMedical Journal 34:585-612). To make

RE: AI-GEOSTATS: Summary: Large sample size and normal distribution

2003-08-14 Thread zij

Hi,

I'm not sure i agree with the idea that a test can be too powerful.  This is a 
common argument in simulation experiments, that because you can do an infinite 
number of replicate simulations, somehow the differences detected are not 
real.  In fact, the differences are real.  They may not be biologically (or 
geologically or whatever field you are in) significant, but they are still 
real.  That is why it is better to decide first on the magnitude of difference 
that you consider significant.  Now, in the case of deviation from normality, 
I suppose you wouldn't have much intuition about what is significant, but the 
relevant question is what is the effect of small deviations from normality on 
your test or conclusions of your analysis?  These kinds of studies are out 
there in the statistical literature for many tests (T-tests etc.) --I'm not 
sure how much has been done to look at the robustness of geostatistical 
analyses, but there are probably some studies (does anyone know?) I would not 
opt for a less-powerful test just to justify an assumption - that's, like, 
unethical or something.

Yetta



--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org

AI-GEOSTATS: 3D variogram analysis.

2003-08-14 Thread Tayfun Yusuf YÜNSEL




Dear All,

I am studying on geostatistical reserve estimation 
and grade distribution. Ineed some information about variograms. 
Anisotropy axes can be found easily in 2D. But in the mining, data includes 
x,y,z coordinates of each datum (for example a drilling log splitted to 3 part 
or composited data) andthus the study 
requires three dimensional variogram analysis. I tried anisotropy axes in x-y 
plane and x or y-z (vertical)plane. But there can be an anisotropy in the 
direction of for example diagonal to the x-y plane. There are a huge amount of 
possible anisotropy directions in 3D. I read many papers on geostatistical 
papers but I am not satisfied. 

Could you please give me some information about 3D 
variogram analysis or related papers?

Thanks in advance.

Tayfun Yusuf YÜNSEL

RE: AI-GEOSTATS:

2003-08-14 Thread Munroe, Darla K

Look in any good analytical cartography textbook you can find - in
discussion of symbolization, they should have the formula.  
-Original Message-
From: tamas
To: [EMAIL PROTECTED]
Sent: 8/9/03 7:07 AM
Subject: AI-GEOSTATS: 

Dear Members,
Where can I find more details about the Jenk's(natural breaks)
optimization  this used by Arcview-ESRI  to classify. Is this absolutly
differ from Kolgomorov-Smir.
Thanks for help.
Janos Tamas
University of Debrecen
http//gisserver1.date.hu

--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org

Re: AI-GEOSTATS: LAI (leaf area index) fromNDVI

2003-08-14 Thread Chris Hlavka

Title: Re: AI-GEOSTATS: LAI (leaf area index)
fromNDVI


You might want to consider the implications of using data with
different supports -
3x3 neighborhoods and points. The 3x3 neighborhoods are
probably larger than the
areas associated with LAI field values. Thus the 3x3 mean
NDVI's can be considered
to be estimates of NDVI's at the points that have associated
error. Error in the independent
(x) variable leads to underestimated correlation and (ordinary
least squares (OLS) regression) slope.
There are a number of alternatives to OLS, such a MA and RMA
regression, that might lead to
improved slope estimates, but I prefer to correct the
correlation and slope estimates using
estimates of the precision of the x variable. 

You can roughly estimate the precision of the point NDVI
by : 1) calculating the standard deviation of the nine pixel
NDVI's associated with each field observation and 2) plotting the
standard deviations versus the means to check for dependence of
variation with magnitude,
3) estimate standard error of NDVI as the mean standard
deviation if no dependence, otherwise consider
regression of LAI versus log(NDVI) or log-log regression.
Note that if the point area is much smaller than a pixel,
the x error
will be underestimated - but fixing this would involve either a
geostatistical analysis of point values for NDVI
or estimation involving fractal analysis.

The error estimate can then be used to correct the estimates of
correlation and slope (my apologies
for cutting and pasting a Word file so that subsrcipts and
superscripts are lost, and maybe also Greek font, and illustrating
with S commands):

Preliminaries: First,
consider regression of variable x (the independent variable) versus y
(the dependent variable). The usual formula for the slope
is:

 S [(xi -mx)*(yi - my)]/S (xi - mx)2







(1)

where summation is over the index i for individual data points, and
the means are mx and my. This formula (section 1.2 in N. Draper
and H. Smith, Applied Regression Analysis, John Wiley 
Sons, Inc., New York, 1966) is correct, and computationally simple
and accurate, that is, works well to preserve floating point
accuracy. However, formulae involving descriptive statistics
(correlation or covariance of x and y, and the standard devations of
x and y) convey more information about the factors related to the
slope:

 cor(x,y)*sy/sx
or cov(x,y)/sx2






(2)

where one can see that the magnitude of the slope increases with the
correlation and range of the dependent variable y (as measured by the
standard deviation), and decreases with range of the independent
variable. If one of the formulae in (2) is used with n data
points, it will be accurate (unbiased) if multiplied by the square
root of (n-2)/(n-1) to correct
for the effect of using estimated, rather than true,
means and if the usual assumptions, including accurate values for the
indpendent variable, are correct. If the range of the independent
variable is inflated by errors, the slope will decrease, that is.
will be biased low.

Predicting the slope when precise values of independent
variable variable x are replaced by the estimated or measured values
variable x, following Section 29.56 in M.
Kendall and A. Stuart, The Advanced Theory of Statistics: Volume
2: Inference and Relationship, 4th Edition, Charles Griffin 
Company Limited, London, 1979 (copy in your mailbox). Let's
assume that the measurements are made without bias and with a
precision represented as a standard deviation in error: the observed
measurements (x1,x2, ) of
the dependent variable x can be considered
as sums of the true values (x1, x2,  ) with 0
standard error plus errors (d1,d2, .) with average of 0 and standard
deviation sd. The least squares
regression slope is cov(x,y)/sx2 = cov(x,y)/( sx2
+ sd2), where cov(x,y) is the covariance
between x and y, i.e. the correlation times the product of the
standard deviations of x and y. Now if the least squares slope
with no errors is cov(x,y)/ sx2 = 1, then
the slope with the errors is:

 cov(x,y)/( sx2
+ sd2) = (cov(x,y)/sx2) * [sx2/ (sx2 + sd2)] = sx2/( sx2 + sd2)




 =
1/[1 + (sd/sx)2]




 
(3)

The _expression_ on the right is a function of the relative
magnitude sd/sx
of the measurement error to data range for the independent variable,
where standard deviation is the metric. The range term sx can be approximated with
sx, the standard deviation of the measured values, if the
range of measurements is large compared to the measurement
errors. Otherwise, correct for the effect of measurement error
by using , leading (as you have noted) to (sx2 -
sd2)/sx2 as the
predicted slope. The estimate for sd
is generally known from an independent source, such as instrument
specs or calibration analysis.

You will note that the slope predicted with (1) is always less than
1. The mathematical cause is due to the inflation of the
denominator from sx2 to sx2 + sd2. Perhaps what is counter-intuitive
is that the slope is biased due to mean zero

AI-GEOSTATS: Call for papers Int Colloquium on LUCC and Env Problems 19-21 December 2003

2003-08-14 Thread Ernan R

FYI

Dr Ernan Rustiadi
Laboratory of Land Resources Development
Planning
Department of Soil Sciences, Faculty of
Agriculture
Bogor Agricultural University (IPB)
Jl. Meranti, Darmaga Campus of IPB.
Bogor, INDONESIA
Tel./Fax: +62-251-422322
email: [EMAIL PROTECTED], [EMAIL PROTECTED]
http://www.hdp-ina.org,
http://www.hdp-ina.net,
http://www.rawg.org

Call for Papers

LUCC and Environmental Problems
International Colloquium, 19-21 December 2003

Increasing of uncontrolled exploitation of natural resources is
currently attaining an alarming level. The environmental problems created by
such exploitation can have a disastrous consequences magnitude. The integrity
and conservation of the environment coupled with sustainable development is very
important for the progress of the nation and industrial growth.
Understanding the dynamics of land use and land cover change processes plays
an important role to improve the understanding of the dynamics interactions
between human activities and natural resources utilization. The two days
Colloquium 2003 of land use and land cover change, will bring together
researchers, academicians, managers and planners as well as critics from
different universities, industries, organizations and agencies from all over the
world to participate in the on-going discussions and present their findings and
opinions for the promotion of broad knowledge of natural, environmental systems,
LUCC, and its issues. OBJECTIVES The main purpose of this
Colloquium are: To present research findings related in land use and land
cover change To discuss and evaluate the latest methods and issues of land
use and land cover change analysis To promote co-operation and networking
among scientist, experts, practitioners and researchers involved in addressing
latest land use and land cover change SESSIONS The scope of
this Colloquium covers relationship between land use and land cover change and:
Urbanization Deforestation Food Problems Water Problems
VENUE The location of this colloquium is at Hotel Salak and
conference centre of Bogor Agriculture University, situated in a peaceful
setting near Bogor Botanical Garden in the Southern part of Jakarta. The
colloquium is open to 70 active participants. Participation is charged.
Potential participants should refer to the registration form.
ABSTRACTContributors submit a title and 150 words abstract your paper
by September,30 2003 by e-mail to: [EMAIL PROTECTED], [EMAIL PROTECTED], or [EMAIL PROTECTED]. or visit
http://www.hdp-ina.org,
http://www.hdp-ina.net,
http://www.raug.org
Selected abstracts will be informed at October 20, 2003. Submission of
full papers should be received before November 30, 2003. Papers presented at
the colloquium will be considered for publication in a book.

GUIDANCE FOR AUTHORS Papers should be written in English on A4 paper
size with Times New Roman font (12 pt), 2 column (except abstracts) Papers
content are title, authors and his or her e-mail address, background, methods,
discussion, conclusion and references. Abstract could be sent by e-mail by
attachment or mail to Organizing Committee. REGISTRATION
Participant could enroll by fax, e-mail or mail this complete filled form to
Organizing Committee.

Re: AI-GEOSTATS: Summary: Large sample size and normal distribution

2003-08-14 Thread Chaosheng Zhang




Dear Yetta,

Thanks for the comments, and I agree with you. I 
think there is a function between sample size and statistical power. The power 
increases with the increase of n. It's true that it is hard to define how 
powerful is "too powerful". Some people suggest to use a lower significance 
level for large n. However, it is also a problem that how low (e.g., 0.001) 
is low enough? Some people suggest not to use the p-valueas mentioned in 
the summary. 

It is also a 
question how serious it may be if the data set does not follow a normal 
distribution. Statisticians may provide us some artificial examples 
showinghow serious it is, but this may not be so serious in the real world 
if it's only a minor departure. Some people even say that statistical methods 
can not be used because our samples are not independent at all because of 
spatial autocorrelation. Well, perhaps I have gone too far, but it is an 
interesting topic. (Geo)statisticians may have better comments.

By the way, I may not summarize again. If anyone 
would like to share your ideas with the list, please copy to it.

Cheers,

Chaosheng


- Original Message - 
From: "zij" [EMAIL PROTECTED]
To: "ai-geostats" [EMAIL PROTECTED]; "Chaosheng 
Zhang" [EMAIL PROTECTED]
Sent: Monday, August 11, 2003 7:13 PM
Subject: RE: AI-GEOSTATS: Summary: Large sample 
size and normal distribution
 Hi,  I'm not sure i agree with the idea that a test 
can be too powerful. This is a  common argument in simulation 
experiments, that because you can do an infinite  number of replicate 
simulations, somehow the differences detected are not  real. In 
fact, the differences are real. They may not be biologically (or  
geologically or whatever field you are in) significant, but they are still 
 real. That is why it is better to decide first on the magnitude 
of difference  that you consider significant. Now, in the case of 
deviation from normality,  I suppose you wouldn't have much intuition 
about what is significant, but the  relevant question is what is the 
effect of small deviations from normality on  your test or conclusions 
of your analysis? These kinds of studies are out  there in the 
statistical literature for many tests (T-tests etc.) --I'm not  sure how 
much has been done to look at the robustness of geostatistical  
analyses, but there are probably some studies (does anyone know?) I would not 
 opt for a less-powerful test just to justify an assumption - that's, 
like,  unethical or something.  Yetta

Re: AI-GEOSTATS: programming ArcView GIS

2003-08-14 Thread Luigi Maiorano

Hi Ellen

every program written for ArcView use the Avenue language (there is a lot of books 
from esri regarding avenue programming). However, ArcGis 8.x uses ArcObject 
(practically Visual Basic). the same esri has published a book for translating avenue 
scripts into arcobject scripts. you should also consider that many .apr scripts have a 
protection (for copiright questions) and thus you cannot read directly the code.

Luigi Maiorano
PhD Student
College of Natural Resources
University of Idaho, Moscow
83843, Idaho (USA)
[EMAIL PROTECTED]

- Original Message -
From: Ellen De Beuckeleer [EMAIL PROTECTED]
Date: Thursday, August 7, 2003 2:57 pm
Subject: AI-GEOSTATS: programming ArcView GIS

 Dear List-members,
 
 How can I program applications for ArcView GIS?
 
 The book Statistical Analysis with ArcView GIS, by Jay Lee and 
 David Wong
 comes with some example scripts, which have file extension .apr.
 Unfortunately these files only work with 3.x version of ArcView. I 
 am using
 version 8 and for my PHD I would like to learn how to program 
 applicationsfor ArcView GIS, especially the Moran I Index.
 
 In which language are .apr files constructed? Are there any good books
 concerning this issue? Where do I start.
 
 Greets,
 
 Ellen
 


--
* To post a message to the list, send it to [EMAIL PROTECTED]
* As a general service to the users, please remember to post a summary of any useful 
responses to your questions.
* To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and unsubscribe 
ai-geostats followed by end on the next line in the message body. DO NOT SEND 
Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org

AI-GEOSTATS: Cluster Analysis

2003-08-14 Thread Wes Highfield









I have a point data set
(~150k records) that contains a ratio variable (percentages, both positive and
negative) that I would like to run a local cluster analysis on. Global Moran's
I and Geary's C indicate clustering as a whole, I just need to get a
statistical measure of where. If I understand correctly (a big if
at this point), a local moran's must be aggregated by region or represent
continuous data, which for various reasons I would like to stay away from. If
I can correctly define a lag distance, would a Local G statistic be better?
What other tests may appropriate for local cluster analysis of discrete point
data without having to use predefined regions or aggregate the data?





Wes Highfield

Graduate Research Assistant

Department of Landscape Architecture and Urban Planning

Texas AM University

College Station, TX 77843-3137

RE: AI-GEOSTATS: Summary: Large sample size and normal distribution

Re: AI-GEOSTATS: Simulation and trends

Re: AI-GEOSTATS: plurigaussian simulations

AI-GEOSTATS: Summary: Large sample size and normal distribution

RE: AI-GEOSTATS: Summary: Large sample size and normal distribution

AI-GEOSTATS: 3D variogram analysis.

RE: AI-GEOSTATS:

Re: AI-GEOSTATS: LAI (leaf area index) fromNDVI

AI-GEOSTATS: Call for papers Int Colloquium on LUCC and Env Problems 19-21 December 2003

Re: AI-GEOSTATS: Summary: Large sample size and normal distribution

Re: AI-GEOSTATS: programming ArcView GIS

AI-GEOSTATS: Cluster Analysis

12 matches

Site Navigation

Mail list logo

Footer information