On Tue, Jun 10, 2008 at 12:56 AM, Anne Archibald <[EMAIL PROTECTED]> wrote: > 2008/6/9 Keith Goodman <[EMAIL PROTECTED]>: >> Does anyone have a function that converts ranks into a Gaussian? >> >> I have an array x: >> >>>> import numpy as np >>>> x = np.random.rand(5) >> >> I rank it: >> >>>> x = x.argsort().argsort() >>>> x_ranked = x.argsort().argsort() >>>> x_ranked >> array([3, 1, 4, 2, 0]) >> >> I would like to convert the ranks to a Gaussian without using scipy. >> So instead of the equal distance between ranks in array x, I would >> like the distance been them to follow a Gaussian distribution. >> >> How far out in the tails of the Gaussian should 0 and N-1 (N=5 in the >> example above) be? Ideally, or arbitrarily, the areas under the >> Gaussian to the left of 0 (and the right of N-1) should be 1/N or >> 1/2N. Something like that. Or a fixed value is good too. > > I'm actually not clear on what you need. > > If what you need is for rank i of N to be the 100*i/N th percentile in > a Gaussian distribution, then you should indeed use scipy's functions > to accomplish that; I'd use scipy.stats.norm.ppf(). > > Of course, if your points were drawn from a Gaussian distribution, > they wouldn't be exactly 1/N apart, there would be some distribution. > Quite what the distribution of (say) the maximum or the median of N > points drawn from a Gaussian is, I can't say, though people have > looked at it. But if you want "typical" values, just generate N points > from a Gaussian and sort them: > > V = np.random.randn(N) > V = np.sort(V) > > return V[ranks] > > Of course they will be different every time, but the distribution will be > right.
I guess I botched the description of my problem. I have data that contains outliers and other noise. I am trying various transformations of the data to preprocess it before plugging it into my prediction algorithm. One such transformation is to rank the data and then convert that rank to a Gaussian. The particular details of the transformation don't matter. I just want something smooth and normal like. > Anne > P.S. why the "no scipy" restriction? it's a bit unreasonable. -A I'd rather not pull in a scipy dependency for one function if there is a numpy alternative. I think it is funny that you picked up on my brief mention of scipy and called it unreasonable. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion