On Mon, Jun 9, 2008 at 7:35 PM, Robert Kern <[EMAIL PROTECTED]> wrote: > On Mon, Jun 9, 2008 at 21:06, Keith Goodman <[EMAIL PROTECTED]> wrote: >> On Mon, Jun 9, 2008 at 4:45 PM, Robert Kern <[EMAIL PROTECTED]> wrote: >>> On Mon, Jun 9, 2008 at 18:34, Keith Goodman <[EMAIL PROTECTED]> wrote: >>>> Does anyone have a function that converts ranks into a Gaussian? >>>> >>>> I have an array x: >>>> >>>>>> import numpy as np >>>>>> x = np.random.rand(5) >>>> >>>> I rank it: >>>> >>>>>> x = x.argsort().argsort() >>>>>> x_ranked = x.argsort().argsort() >>>>>> x_ranked >>>> array([3, 1, 4, 2, 0]) >>> >>> There are subtleties in computing ranks when ties are involved. Take a >>> look at the implementation of scipy.stats.rankdata(). >> >> Good point. I had to deal with ties and missing data. I bet >> scipy.stats.rankdata() is faster than my implementation. > > Actually, it's pretty slow. I think there are opportunities to make it > faster, but I haven't explored them. > >>>> I would like to convert the ranks to a Gaussian without using scipy. >>> >>> No dice. You are going to have to use scipy.special.ndtri somewhere. A >>> basic transformation (off the top of my head, I have no idea if this >>> is statistically meaningful): >>> >>> scipy.special.ndtri((ranks + 1.0) / (len(ranks) + 1.0)) >>> >>> Barring tied first or last items, this should give equal weight to >>> each of the tails outside of the range of your data. >> >> Nice. Thank you. It passes the never wrong chi-by-eye test: >> >> r = np.arange(1000) >> g = special.ndtri((r + 1.0) / (len(r) + 1.0)) >> pylab.hist(g, 50) >> pylab.show() > > BTW, what is your use case? If you are trying to compare your data to > the normal distribution, you might want to go the other way: find the > empirical quantiles of your data and compare them against the > hypothetical quantiles of the data on the distribution. This *is* an > established statistical technique; when you graph one versus the > other, you get what is known as a Q-Q plot.
Some of the techniques I use are sensitive to noise. So to make things more robust I sometimes transform my noisy, sort-of-normal data. There's no theory to guide me, only empirical results. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion