On Mon, Jun 9, 2008 at 21:06, Keith Goodman <[EMAIL PROTECTED]> wrote: > On Mon, Jun 9, 2008 at 4:45 PM, Robert Kern <[EMAIL PROTECTED]> wrote: >> On Mon, Jun 9, 2008 at 18:34, Keith Goodman <[EMAIL PROTECTED]> wrote: >>> Does anyone have a function that converts ranks into a Gaussian? >>> >>> I have an array x: >>> >>>>> import numpy as np >>>>> x = np.random.rand(5) >>> >>> I rank it: >>> >>>>> x = x.argsort().argsort() >>>>> x_ranked = x.argsort().argsort() >>>>> x_ranked >>> array([3, 1, 4, 2, 0]) >> >> There are subtleties in computing ranks when ties are involved. Take a >> look at the implementation of scipy.stats.rankdata(). > > Good point. I had to deal with ties and missing data. I bet > scipy.stats.rankdata() is faster than my implementation.
Actually, it's pretty slow. I think there are opportunities to make it faster, but I haven't explored them. >>> I would like to convert the ranks to a Gaussian without using scipy. >> >> No dice. You are going to have to use scipy.special.ndtri somewhere. A >> basic transformation (off the top of my head, I have no idea if this >> is statistically meaningful): >> >> scipy.special.ndtri((ranks + 1.0) / (len(ranks) + 1.0)) >> >> Barring tied first or last items, this should give equal weight to >> each of the tails outside of the range of your data. > > Nice. Thank you. It passes the never wrong chi-by-eye test: > > r = np.arange(1000) > g = special.ndtri((r + 1.0) / (len(r) + 1.0)) > pylab.hist(g, 50) > pylab.show() BTW, what is your use case? If you are trying to compare your data to the normal distribution, you might want to go the other way: find the empirical quantiles of your data and compare them against the hypothetical quantiles of the data on the distribution. This *is* an established statistical technique; when you graph one versus the other, you get what is known as a Q-Q plot. > I wasn't able to use scipy.special.ndtri (after import scipy) like you > did. Actually, I didn't specify any import. The behavior you see is correct. Importing scipy alone doesn't import any of the subpackages. If you like, you can pretend there was an implicit "import scipy.special" before the code I gave. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list [email protected] http://projects.scipy.org/mailman/listinfo/numpy-discussion
