On Mon, Jun 9, 2008 at 21:06, Keith Goodman <[EMAIL PROTECTED]> wrote:
> On Mon, Jun 9, 2008 at 4:45 PM, Robert Kern <[EMAIL PROTECTED]> wrote:
>> On Mon, Jun 9, 2008 at 18:34, Keith Goodman <[EMAIL PROTECTED]> wrote:
>>> Does anyone have a function that converts ranks into a Gaussian?
>>>
>>> I have an array x:
>>>
>>>>> import numpy as np
>>>>> x = np.random.rand(5)
>>>
>>> I rank it:
>>>
>>>>> x = x.argsort().argsort()
>>>>> x_ranked = x.argsort().argsort()
>>>>> x_ranked
>>>   array([3, 1, 4, 2, 0])
>>
>> There are subtleties in computing ranks when ties are involved. Take a
>> look at the implementation of scipy.stats.rankdata().
>
> Good point. I had to deal with ties and missing data. I bet
> scipy.stats.rankdata() is faster than my implementation.

Actually, it's pretty slow. I think there are opportunities to make it
faster, but I haven't explored them.

>>> I would like to convert the ranks to a Gaussian without using scipy.
>>
>> No dice. You are going to have to use scipy.special.ndtri somewhere. A
>> basic transformation (off the top of my head, I have no idea if this
>> is statistically meaningful):
>>
>>  scipy.special.ndtri((ranks + 1.0) / (len(ranks) + 1.0))
>>
>> Barring tied first or last items, this should give equal weight to
>> each of the tails outside of the range of your data.
>
> Nice. Thank you. It passes the never wrong chi-by-eye test:
>
> r = np.arange(1000)
> g = special.ndtri((r + 1.0) / (len(r) + 1.0))
> pylab.hist(g, 50)
> pylab.show()

BTW, what is your use case? If you are trying to compare your data to
the normal distribution, you might want to go the other way: find the
empirical quantiles of your data and compare them against the
hypothetical quantiles of the data on the distribution. This *is* an
established statistical technique; when you graph one versus the
other, you get what is known as a Q-Q plot.

> I wasn't able to use scipy.special.ndtri (after import scipy) like you
> did.

Actually, I didn't specify any import. The behavior you see is
correct. Importing scipy alone doesn't import any of the subpackages.
If you like, you can pretend there was an implicit "import
scipy.special" before the code I gave.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
_______________________________________________
Numpy-discussion mailing list
[email protected]
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to