Re: [R-sig-eco] Log transforming zero value data

Carsten Dormann Wed, 24 Jun 2009 00:47:11 -0700

Dear Nate,

although I learned from Phillippe's response about the existence oflog1p, I don't think I will use it (for reasons below). Thierry'sresponse is true for Poisson data, but not for non-integer values.Still, it points into an important direction: All too often zerosemanate from a different process than the other values (see mixeddistributions, zero-inflated, hurdle and all that). In that case, youshould consult Ben Bolker's excellent book (which is probably stillavailable as a draft on his homepage, but also worth buying).


If you want to transform, here is my take:

My folk-law guidelines on the c in log(x+c) are:

1. c should roughly be 1/2 of the smallest, non-zero value:signif(0.5*sort(unique(x))[2], 2)2. c should be quadrat of the first quantile devided by the thirdquantile: (quantile(x)[2]^2)/quantile(x)[4]

For example:
set.seed(11011)
x <- c(runif(95), rep(0,5))

Method 1: c=0.0015
Method 2: c=0.015

While this looks like a huge difference (an order of magnitude), itactually isn't all that much, given the range of the data:


plot(density(x))
abline(v=c(0.0015, 0.015))

I do have a reference for method 2, but it is German (Stahel, W. A.(2002) Statistische Datenanalyse. Eine Einführung fürNaturwissenschaftler. Vieweg, Braunschweig.)._ Method 1 is what my PhD's statistics adviser recommended. Since he wasright in everything else, I rely on his advise here, too. That may, Iacknowledge, not be good enough for you. But maybe someone else finds aproper reference.

The key thing for any value of c is that it doesn't distort theanalysis. But then, how do you detect distortion? I used a comparison ofrank-transformed data and various values of c. When c was large (in thecurrent example e.g. 0.5 or so), the analysis started to differ from therank-analysis. To use log1p here would be a dramatic distortion!

Another way to look at it is through Box-Cox-transformation. SinceBox-Cox transforms towards symmetric (not necessarily normal)distribution, also c should be chosen in such a way as to facilitate thetransformation towards symmetry.


HTH,

Carsten


Nate Upham wrote:

I have a general stats question for you guys:

How does one normally deal with zero (0) values when log transforming data?

I would like to log transform (natural log, ln) several response variables for 
use in quantile
regression.  But one of my variables includes several zero values.  Since ln(0) 
= infinity, this is
not readily possible.  Is it best to remove all data with zero values?  Or 
should I add a very small
number to each value (e.g., 0.00001)?  This seems problematic.  Is there an 
easy way to address this
issue?

Thanks much for your help,
--Nate

_________________________________
Nathan S. Upham
Ph.D. student
Committee on Evolutionary Biology
University of Chicago
1025 E. 57th St., Culver 402
Chicago, IL 60637
nsup...@uchicago.edu

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


--
Dr. Carsten F. Dormann
Department of Computational Landscape Ecology

Helmholtz Centre for Environmental Research-UFZPermoserstr. 15

04318 Leipzig
Germany

Tel: ++49(0)341 2351946
Fax: ++49(0)341 2351939
Email: carsten.dorm...@ufz.de
internet: http://www.ufz.de/index.php?de=4205

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] Log transforming zero value data

Reply via email to