I have applied the update to the documentation (although that function needs a general rewrite - later...)
>On Mon, Aug 15, 2011 at 8:53 AM, Andrea Gavana <andrea.gav...@gmail.com>wrote: > >> Hi Chris and All, >> >> On 12 August 2011 16:53, Christopher Jordan-Squire wrote: >> > Hi Andrea--An easy way to get something like this would be >> > >> > import numpy as np >> > import scipy.stats as stats >> > >> > sigma = #some reasonable standard deviation for your application >> > x = stats.norm.rvs(size=1000, loc=125, scale=sigma) >> > x = x[x>50] >> > x = x[x<200] >> > >> > That will give a roughly normal distribution to your velocities, as long >> as, >> > say, sigma<25. (I'm using the rule of thumb for the normal distribution >> that >> > normal random samples lie 3 standard deviations away from the mean about >> 1 >> > out of 350 times.) Though you won't be able to get exactly normal errors >> > about your mean since normal random samples can theoretically be of any >> > size. >> > >> > You can use this same process for any other distribution, as long as >> you've >> > chosen a scale variable so that the probability of samples being outside >> > your desired interval is really small. Of course, once again your random >> > errors won't be exactly from the distribution you get your original >> samples >> > from. >> >> Thank you for your suggestion. There are a couple of things I am not >> clear with, however. The first one (the easy one), is: let's suppose I >> need 200 values, and the accept/discard procedure removes 5 of them >> from the list. Is there any way to draw these 200 values from a bigger >> sample so that the accept/reject procedure will not interfere too >> much? And how do I get 200 values out of the bigger sample so that >> these values are still representative? >> > >FWIW, I'm not really advocating a truncated normal so much as making the >standard deviation small enough so that there's no real difference between a >true normal distribution and a truncated normal. > >If you're worried about getting exactly 200 samples, then you could sample N >with N>200 and such that after throwing out the ones that lie outside your >desired region you're left with M>200. Then just randomly pick 200 from >those M. That shouldn't bias anything as long as you randomly pick them. (Or >just pick the first 200, if you haven't done anything to impose any order on >the samples, such as sorting them by size.) But I'm not sure why you'd want >exactly 200 samples instead of some number of samples close to 200. > > >> >> Another thing, possibly completely unrelated. I am trying to design a >> toy Latin Hypercube script (just for my own understanding). I found >> this piece of code on the web (and I modified it slightly): >> >> def lhs(dist, size=100): >> ''' >> Latin Hypercube sampling of any distrbution. >> dist is is a scipy.stats random number generator >> such as stats.norm, stats.beta, etc >> parms is a tuple with the parameters needed for >> the specified distribution. >> >> :Parameters: >> - `dist`: random number generator from scipy.stats module. >> - `size` :size for the output sample >> ''' >> >> n = size >> >> perc = numpy.arange(0.0, 1.0, 1.0/n) >> numpy.random.shuffle(perc) >> >> smp = [stats.uniform(i,1.0/n).rvs() for i in perc] >> >> v = dist.ppf(smp) >> >> return v >> >> >> Now, I am not 100% clear of what the percent point function is (I have >> read around the web, but please keep in mind that my statistical >> skills are close to minus infinity). From this page: >> >> http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm >> >> >The ppf is what's called the quantile function elsewhere. I do not know why >scipy calls it the ppf/percent point function. > >The quantile function is the inverse of the cumulative density function >(cdf). So dist.ppf(z) is the x such that P(dist <= x) = z. Roughly. (Things >get slightly more finicky if you think about discrete distributions because >then you have to pick what happens at the jumps in the cdf.) So >dist.ppf(0.5) gives the median of dist, and dist.ppf(0.25) gives the >lower/first quartile of dist. > > >> I gather that, if you plot the results of the ppf, with the horizontal >> axis as probability, the vertical axis goes from the smallest to the >> largest value of the cumulative distribution function. If i do this: >> >> numpy.random.seed(123456) >> >> distribution = stats.norm(loc=125, scale=25) >> >> my_lhs = lhs(distribution, 50) >> >> Will my_lhs always contain valid values (i.e., included between 50 and >> 200)? I assume the answer is no... but even if this was the case, is >> this my_lhs array ready to be used to setup a LHS experiment when I >> have multi-dimensional problems (in which all the variables are >> completely independent from each other - no correlation)? >> >> >I'm not really sure if the above function is doing the lhs you want. To >answer your question, it won't always generate values within [50,200]. If >size is large enough then you're dividing up the probability space evenly. >So even with the random perturbations (whose use I don't really understand), >you'll ensure that some of the samples you get when you apply the ppf will >correspond to the extremely low probability samples that are <50 or >200. > >-Chris JS > >My apologies for the idiocy of the questions. >> >> Andrea. >> >> "Imagination Is The Only Weapon In The War Against Reality." >> http://xoomer.alice.it/infinity77/ >> >> >>> import PyQt4.QtGui >> Traceback (most recent call last): >> File "<interactive input>", line 1, in <module> >> ImportError: No module named PyQt4.QtGui >> >>> >> >>> import pygtk >> Traceback (most recent call last): >> File "<interactive input>", line 1, in <module> >> ImportError: No module named pygtk >> >>> >> >>> import wx >> >>> >> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | a...@ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion