Hello, I'm trying to find a way to collect a set of values from real data, and then sample values randomly from this data - so, the data I'm collecting becomes a kind of probability distribution. For instance, I might have age data for some children. It's very easy to collect this data using a list, where the index gives the value of the data, and the number in the list gives the number of times that values occurs:
[0,0,10,20,5] could mean that there are no individuals that are no people aged 0, no people aged 1, 10 people aged 2, 20 people aged 3, and 5 people aged 4 in my data collection. I then want to make a random sample that would be representative of these proportions - is there any easy and fast way to select an entry weighted by its value? Or are there any python packages that allow you to easily create your own distribution based on collected data? Two other things to bear in mind are that in reality I'm collating data from up to around 5 million individuals, so just making one long list with a new entry for each individual won't work. Also, it would be good if I didn't have to decide before hand what the possible range of values is (which unfortunately I have to do with the approach I'm currently working on). Thanks in advance for your help, elsa. -- http://mail.python.org/mailman/listinfo/python-list