I appologize for the delay in posting a summary to my question on data transformation. Here it is, better late than never, I hope. Thanks a lot to all those who responded. Juliann The key question about transformations and geostatistics is whether one needs to re-transform. For example, if one uses a log transform (not logit) then usually one wants to re-transform to the original form whereas in the case of the indicator transform one does not re-transform. The two difficulties and problems that arise are (1) how the variogram of the original and the variogram of the transformed variable are related, (2) in the case of a re-transformation how to compute the bias. (1) is probably not a problem if you are not going to re-transform but to actually compute the relationship one would need to know the multivariate distribution density function (even then it may be difficult) which is very unlikely in most geostatistical applications. Donald Meyers It is always better to use untransformed data if you can. Every complexity you add to your modelling increases your chances of things going wrong exponentially. Prime rule: simpler is always better What I do every time I get a new set of data is the following: (1) calculate semi-variograms and look at histograms. If semi-variogram nice, model and continue. If not: (2) take logarithms and repeat. If still not nice: (3) try indicators (lots of) to see if you have mixed distributions or something similar. If still not nice: (4) try a rank order (uniform) transform. If you still don't got nice semi-variograms there is something BADLY WRONG with your data. Re-assess your basic assumptions: (a) precise reproducable data? (b) accurate representative data? (c) homogeneous sampling zones (single populations)? (d) trend? Isobel Clark Handling correlation on the link scale vs handling it on the unadjusted scale is apparently "a topic of discussion in statistics." However, the following may help: if you handle covariance on the link scale you are working with a subject-specific model while a population averaged model refers to modeling the covariance in the error term. I'd recommend getting a copy of Wolfinger R and M O'Connell 1993 on generalized linear mixed models. Fundamentally, your approach may depend on your goals. Are you really trying to explain outcomes using predictor variables? Are you fundamentally interested in the covariance from an ecological perspective? Or, are you trying to predict the number of trees per given area?? If your goal falls into the former two categories and if you have a nonignorable source of nonstationarity, then you can adjust for that nonstationarity using binary or binomial regression. If you have covariates at the tree level, then you might want to use the binary route. You'll need to pick a link but you might find that a logit link might get you started. After modeling the mean using logistic regression, you can assess the spatial structure of the residuals by building semivariograms from the Pearson or deviance residuals. if you observe structure, you can model both the nonstationarity *and* the covariance using generalized linear mixed models. if you get this far, you should probably have read the papers below (or their equivalent). you can model spatial variability as either a random effect or as correlated errors. all this can be done in SAS using PROC LOGISTIC, PROC SEMIVARIOGRAM and the GLIMMIX macro, respectively . Brian z. Gotway, CA and WW Stroup. 1997. A generalized linear model approach to spatial data analysis and prediction. JABES 2: 157-178. aa. Gumpertz, ML, C Wu and JM Pye. 2000. Logistic regression for Southern Pine Beetle outbreaks with spatial and temporal correlation. Forest Science 46: 95-107. Wolfinger, R. 1993. Covariance structure selection in general mixed models. Communications in Statistics–Simulations 22: 1079-1106. Wolfinger, R. and M O'Connell. 1993. Generalized Linear Mixed Models: A Pseudo-Likelihood Approach. Journal of Statistical Computation and Simulation 48: 233-243 Brian Gray I think the problem might be even more subtle. Essentially you are looking at a marked point process, and trying to apply methods designed principally for data that is continuous throughout the sampling domain. I would suggest looking at the following paper: Stoyan and Waelder 2000. On variograms in point process statistics II. Models of markings and ecological interpretation. Biometrical journal 42(2):171-187 Another approach you might think about is spatial cdf estimation. take a look at the work of cressie and friends. Nicholas Lewin-Koh > >Juliann Aukema wrote: > > > >> Hi. I have a question about transforming data. > >> > >> I have infection prevalence data for many points- a proportion of > >> trees infected. Numbers are between 0 and 1. Sample size varies for the > >> different points (because density of trees varies). When I plot a >variogram > >> of the prevalence data, I get a nice sill for about 4000 meters and >then a > >> rise in the variogram. If I take the residuals of prevalence against > >> elevation the second rise goes away. Biologically this all makes sense and > >> makes a nice story. > >> However for some other analyses that I also did with this data, I > >> was advised to logit transform the prevalence data because it is a > >> proportion and should be binomially distributed. > >> If I plot the variogram of the logit transformed prevalence, the > >> first sill is much less distinct if it is there at all - this seems to be > >> mostly due to one point, the last point before the rise, which now goes up > >> instead of being about even with the previous point. ( I guess this > >> difference is due to the stretching of zero prevalence values that occurs > >> with the logit transformation.) And if I look at smaller lags, it looks > >> like a power function with no sill. Biologically, that is harder to > >> explain. If I plot the residuals of the (logit transformed prevalence) > >> against ( elevation), the variogram has a nice sill and is similar, even > >> prettier than the analysis of the untransformed data (but based on the > >> previous variogram, I don't have a very good reason for plotting the > >> residuals). > >> My question, then is whether the logit transformation is necessary > >> and/or appropriate for the geostatistical analysis. Does it make sense to > >> use the transformed data for both variograms, for just the residuals > >> (because the residuals are based on regression for which the >transformation > >> ought to be done) or for neither? > >> Thank you very much. > >> > >> Juliann > >> [EMAIL PROTECTED] > >> -- * To post a message to the list, send it to [EMAIL PROTECTED] * As a general service to the users, please remember to post a summary of any useful responses to your questions. * To unsubscribe, send an email to [EMAIL PROTECTED] with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list * Support to the list is provided at http://www.ai-geostats.org