Ivan Kazmenko:
Still, the current implementation uses divisions (which is slow on most CPUs), and for a good source of random bits such as MT19937, the implementation which repeatedly gets exactly core.bitop.bsr(LIMIT)+1 bits until the result is in [0, LIMIT) may be faster.
Is this to report in Bugzilla? Bye, bearophile