RE: [R] Zipf random number generation
Thank you very much again. Your code is really helpful and it demontrates the power of R to a new-comer like me. --- [EMAIL PROTECTED] wrote: > Understood! Though I think you mean > > 1/c = sum(1.0 / pow((double) j, alpha)), > j=1,2,...,N You are right. Weiguang __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Zipf random number generation
On 25-Jan-05 Weiguang Shi wrote: > Sorry. > > c = sum(1.0 / pow((double) j, alpha)), j=1,2,...,N Understood! Though I think you mean 1/c = sum(1.0 / pow((double) j, alpha)), j=1,2,...,N Anyway, provided alpha > 1 the series converges for the infinite sum, but if you limit the range to 1...N then you could have any value of alpha (even 0 or negative). Be that as it may, I don't know of any R function for sampling such a distribution directly, but you could certainly try the suggestion I gave at the end of my previous response: N <- 1 ##For instance. You need N large enough i0 <- (1:N) p <- 1/(i0^alpha) ; p <- p/sum(p) X <- sample (i0, n, replace=TRUE, prob=p) Then X will be a random sample of size n from the distribution you describe on the range 1:N. When I tried the above with alpha=1.1, N=1, n=1000 the largest value of X was 9928, i.e. very nearly equal to N, so this suggested that N=1 may not be large enough! So I then tried it with N=5 and got max(X) = 48303; then with N=10 and max(X) = 94914. And so on (at this point I was hitting swap-space on this little machine). In short: with alpha so near 1, the sample will tend to extend over the whole range you give it when you specify N. This is an indication that the simple-minded method of my suggestion may fail to provide sufficiently large sampled values when alpha is near 1. However, since the "common practices" in your area, as you describe them below, do set a value of N, then presumably there is also a "common view" of what limit for N is acceptable -- i.e. in your area it does not matter if you fail to sample values larger than N. In that case the above method should be satisfactory. If not, then some other approach would be needed. You certainly don't want to use the above method with N approaching the largest machine-representable integer! One possibility might be to split the sampling. Let alpha > 1. Choose N fairly large. You would need a method for evaluating sum(i^alpha) from 1 to infinity. Let P = sum[1:N](i^alpha)/sum[1:inf](i^alpha). With probability P, sample on (1:N) with the above method. With probability (1-P), use a continuous approximation on [(N+1):inf] with density proportional to 1/(x^alpha). For the latter, the CDF is easily evaluated analytically; likewise its inverse function. Hence by sampling from a uniform distribution you can sample X from the distribution with density proportional to 1/(x^alpha) on [(N+1):inf]. Then get the sampled value of i as floor(X). This is in interesting question! I hope other readers can suggest good ideas. Hoping this helps, Ted. > --- Weiguang Shi <[EMAIL PROTECTED]> wrote: >> Thanks very much Ted for the detailed explanation. >> >> It might be flawed but some "common practices" in my >> area use the following approach to generate a random >> rank for some Zipf-like distribution with the >> parameter alpha. >> >> The key is to introduce the limit of ranks, N. With >> N >> and alpha, therefore, the probability of rank i is >> determined by >> c/pow((double) i, alpha) >> where >> c = sum(1.0 / pow((double) i, alpha)) >> >> Any comment on that? >> >> Thanks, >> Weiguang >> >> --- [EMAIL PROTECTED] wrote: >> > On 25-Jan-05 Weiguang Shi wrote: >> > > Hi, >> > > >> > > Is there a Zipf-like distribution RNG in R? >> > > >> > > Thanks, >> > > Weiguang >> > >> > "Zipf's Law" (as originally formulated in studies >> of >> > the >> > frequencies of words in texts) is to the effect >> that >> > the >> > relative frequencies with which words occur once, >> > twice, >> > 3 times, ... are in proportion to 1/1, 1/2, 1/3, >> ... >> > , >> > 1/n, ... with no limit on n (i.e. the number of >> > different >> > words each represented n times is proportional to >> > 1/n). >> > >> > This is "improper" since sum(1/n) = infinity, so >> > does not >> > define a probability distribution. A respectable >> > analogue >> > is Fisher's logarithmic distribution p(x) where >> > >> > p(x) = ((t^x)/x)/(-log(1-t)), x = 1, 2, 3, ... >> > >> > where t in (0,1) is the parameter of the >> > distribution. >> > >> > As t -> 1, p(x+1)/p(x) -> x/(x+1) as in Zipf's >> Law. >> > >> > However, I've searched the R site and have found >> > only >> > one instance of a function directly related to >> this >> > distribution, namely logff() in package VGAM, >> which >> > is for estimating the parameter t. >> > >> > So it looks as though there is no direct >> > implementation >> > of something like "rlogdist". >> > >> > However, the logarithmic distribution is a >> limiting >> > form of the negative binomial distribution >> > (conditional >> > on a positive value), and there are functions in R >> > for >> > random sampling from this distribution. >> > >> > From my reading of "?rnbinom" in the base package, >> > this >> > function does not have the flexibitility you would >> > need >> > for this. But in the MASS package there is the >> > function >> > rnegbin() which does. >
RE: [R] Zipf random number generation
Sorry. c = sum(1.0 / pow((double) j, alpha)), j=1,2,...,N --- Weiguang Shi <[EMAIL PROTECTED]> wrote: > Thanks very much Ted for the detailed explanation. > > It might be flawed but some "common practices" in my > area use the following approach to generate a random > rank for some Zipf-like distribution with the > parameter alpha. > > The key is to introduce the limit of ranks, N. With > N > and alpha, therefore, the probability of rank i is > determined by > c/pow((double) i, alpha) > where > c = sum(1.0 / pow((double) i, alpha)) > > Any comment on that? > > Thanks, > Weiguang > > --- [EMAIL PROTECTED] wrote: > > On 25-Jan-05 Weiguang Shi wrote: > > > Hi, > > > > > > Is there a Zipf-like distribution RNG in R? > > > > > > Thanks, > > > Weiguang > > > > "Zipf's Law" (as originally formulated in studies > of > > the > > frequencies of words in texts) is to the effect > that > > the > > relative frequencies with which words occur once, > > twice, > > 3 times, ... are in proportion to 1/1, 1/2, 1/3, > ... > > , > > 1/n, ... with no limit on n (i.e. the number of > > different > > words each represented n times is proportional to > > 1/n). > > > > This is "improper" since sum(1/n) = infinity, so > > does not > > define a probability distribution. A respectable > > analogue > > is Fisher's logarithmic distribution p(x) where > > > > p(x) = ((t^x)/x)/(-log(1-t)), x = 1, 2, 3, ... > > > > where t in (0,1) is the parameter of the > > distribution. > > > > As t -> 1, p(x+1)/p(x) -> x/(x+1) as in Zipf's > Law. > > > > However, I've searched the R site and have found > > only > > one instance of a function directly related to > this > > distribution, namely logff() in package VGAM, > which > > is for estimating the parameter t. > > > > So it looks as though there is no direct > > implementation > > of something like "rlogdist". > > > > However, the logarithmic distribution is a > limiting > > form of the negative binomial distribution > > (conditional > > on a positive value), and there are functions in R > > for > > random sampling from this distribution. > > > > From my reading of "?rnbinom" in the base package, > > this > > function does not have the flexibitility you would > > need > > for this. But in the MASS package there is the > > function > > rnegbin() which does. > > > > You would need to invoke > > > > rnegbin(N, mu=..., theta=... ) > > > > where the value ... of mu is large and the value > ... > > of > > theta is small, and reject cases with value zero. > > This > > of course makes it awkward to generate a sample of > > given > > size. > > > > Alternatively, you can try along the lines of > > > > > x0<-1:1; t<-0.999; > p<-((t^x0)/x0)/(-log(1-t)) > > > Y<-sample(x0,5000,replace=TRUE,prob=p) > > > hist(Y,breaks=100) > > > > While this gives the logarithmic distribution over > > the > > range of x in x0, it is inexact in that it does > not > > permit values greater than max(x0) to be sampled. > > > > No doubt others can suggest something better than > > this! > > > > Best wishes, > > Ted. > > > > > > > > > E-Mail: (Ted Harding) > <[EMAIL PROTECTED]> > > Fax-to-email: +44 (0)870 094 0861 [NB: New > number!] > > Date: 25-Jan-05 > > > Time: 10:20:57 > > -- XFMail > > -- > > > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Zipf random number generation
Thanks very much Ted for the detailed explanation. It might be flawed but some "common practices" in my area use the following approach to generate a random rank for some Zipf-like distribution with the parameter alpha. The key is to introduce the limit of ranks, N. With N and alpha, therefore, the probability of rank i is determined by c/pow((double) i, alpha) where c = sum(1.0 / pow((double) i, alpha)) Any comment on that? Thanks, Weiguang --- [EMAIL PROTECTED] wrote: > On 25-Jan-05 Weiguang Shi wrote: > > Hi, > > > > Is there a Zipf-like distribution RNG in R? > > > > Thanks, > > Weiguang > > "Zipf's Law" (as originally formulated in studies of > the > frequencies of words in texts) is to the effect that > the > relative frequencies with which words occur once, > twice, > 3 times, ... are in proportion to 1/1, 1/2, 1/3, ... > , > 1/n, ... with no limit on n (i.e. the number of > different > words each represented n times is proportional to > 1/n). > > This is "improper" since sum(1/n) = infinity, so > does not > define a probability distribution. A respectable > analogue > is Fisher's logarithmic distribution p(x) where > > p(x) = ((t^x)/x)/(-log(1-t)), x = 1, 2, 3, ... > > where t in (0,1) is the parameter of the > distribution. > > As t -> 1, p(x+1)/p(x) -> x/(x+1) as in Zipf's Law. > > However, I've searched the R site and have found > only > one instance of a function directly related to this > distribution, namely logff() in package VGAM, which > is for estimating the parameter t. > > So it looks as though there is no direct > implementation > of something like "rlogdist". > > However, the logarithmic distribution is a limiting > form of the negative binomial distribution > (conditional > on a positive value), and there are functions in R > for > random sampling from this distribution. > > From my reading of "?rnbinom" in the base package, > this > function does not have the flexibitility you would > need > for this. But in the MASS package there is the > function > rnegbin() which does. > > You would need to invoke > > rnegbin(N, mu=..., theta=... ) > > where the value ... of mu is large and the value ... > of > theta is small, and reject cases with value zero. > This > of course makes it awkward to generate a sample of > given > size. > > Alternatively, you can try along the lines of > > > x0<-1:1; t<-0.999; p<-((t^x0)/x0)/(-log(1-t)) > > Y<-sample(x0,5000,replace=TRUE,prob=p) > > hist(Y,breaks=100) > > While this gives the logarithmic distribution over > the > range of x in x0, it is inexact in that it does not > permit values greater than max(x0) to be sampled. > > No doubt others can suggest something better than > this! > > Best wishes, > Ted. > > > > E-Mail: (Ted Harding) <[EMAIL PROTECTED]> > Fax-to-email: +44 (0)870 094 0861 [NB: New number!] > Date: 25-Jan-05 > Time: 10:20:57 > -- XFMail > -- > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Zipf random number generation
On 25-Jan-05 Weiguang Shi wrote: > Hi, > > Is there a Zipf-like distribution RNG in R? > > Thanks, > Weiguang "Zipf's Law" (as originally formulated in studies of the frequencies of words in texts) is to the effect that the relative frequencies with which words occur once, twice, 3 times, ... are in proportion to 1/1, 1/2, 1/3, ... , 1/n, ... with no limit on n (i.e. the number of different words each represented n times is proportional to 1/n). This is "improper" since sum(1/n) = infinity, so does not define a probability distribution. A respectable analogue is Fisher's logarithmic distribution p(x) where p(x) = ((t^x)/x)/(-log(1-t)), x = 1, 2, 3, ... where t in (0,1) is the parameter of the distribution. As t -> 1, p(x+1)/p(x) -> x/(x+1) as in Zipf's Law. However, I've searched the R site and have found only one instance of a function directly related to this distribution, namely logff() in package VGAM, which is for estimating the parameter t. So it looks as though there is no direct implementation of something like "rlogdist". However, the logarithmic distribution is a limiting form of the negative binomial distribution (conditional on a positive value), and there are functions in R for random sampling from this distribution. >From my reading of "?rnbinom" in the base package, this function does not have the flexibitility you would need for this. But in the MASS package there is the function rnegbin() which does. You would need to invoke rnegbin(N, mu=..., theta=... ) where the value ... of mu is large and the value ... of theta is small, and reject cases with value zero. This of course makes it awkward to generate a sample of given size. Alternatively, you can try along the lines of > x0<-1:1; t<-0.999; p<-((t^x0)/x0)/(-log(1-t)) > Y<-sample(x0,5000,replace=TRUE,prob=p) > hist(Y,breaks=100) While this gives the logarithmic distribution over the range of x in x0, it is inexact in that it does not permit values greater than max(x0) to be sampled. No doubt others can suggest something better than this! Best wishes, Ted. E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 25-Jan-05 Time: 10:20:57 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Zipf random number generation
Hi, Is there a Zipf-like distribution RNG in R? Thanks, Weiguang __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html