I have been reading, in various sources, that a poisson distribution is related to binomial, extending the idea to include numbers of events in a given period of time.
In my case, the hypergeometric distribution seems more appropriate, but I need a temporal dimension to the distribution. I have weekly samples of two kinds of events: call them A and B. I have a count of A events. These change dramatically from one week to the next. I also have weekly counts of B events that I can relate to A events. Some fraction 'lambda' (between 1 and 1) of A events will result in B events some time in the future (but also sometimes in the same week that the related A event occured). The B event related to a given A event can occur as much as ten weeks after the A event. B events can not occur without a prior A event, and well over half of the A events will never produce a B event. Also, we know that a given A event can not produce more than one B event. Hence hypergeometric is much more appropriate than binomial, and thus my need for the distribution that has the same relation to the hypergeometric that the poisson has to binomial. Since hypergeometric is related to binomial, would poisson also be related to hypergeometric? My data is best expressed as a fraction: number of B events in a given week divided by the number of A events producing the B events. I.e. if there are 500 A events in week n, the data would be the number of related B events in week m (m >= n) divided by 500. and the first table I get from the DB has records containing an ordered pair: week number, fraction. E.g. 0,0.2 1,0.3 2,0.25 3,0.2 ... The above is dummy data, but the pattern I see in the data is that the number of B events in week 0 is less than the number of B events in week 1, but from then on, the number of B events declines exponentially (as you'd expect from what could be described as a decay process, altered to reflect the fact that over half of the original A events will never produce B events). Of all the distributions I tried on this data, exponential and poisson produced the best fits, with very little to choose between them. Always, the cumulative fraction of A events that have produced B events approaches an asymptote between 0.25 and 0.45. Never higher, but now it looks like the asymptotes are getting smaller (the behaviour of the system is changing). In a sense, this breaks down into two questions: 1) What distribution should I try to fit to my data? 2) How do I present my data to the functions that will try to fit the distribution to this data? The reason for the second is that, while I have examined lots of functions (fBasics, MASS, &c.) that will try to fit a distribution to data, they all seem to expect a 1D vector of data and none of them say anything about the data, or what to do if you already have an empirical (cumulative) distribution. To try out the functions that fit distributions, I created a dummy vector where the initial sample size was 1000, and the number of values equal to a given week number would be 1000 * the faction of A events that produced B events. E.g. (using the sample numbers above, there'd be 200 '0's, 300 '1's, 250 '2's, &c.) Thanks Ted -- View this message in context: http://www.nabble.com/What-distribution-is-related-to-hypergeometric--tp19671054p19671054.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.