Just to chime in: Numba would definitely appreciate C functions to access the random distribution implementations, and have a side-project (numba-scipy) that is making the Cython wrapped functions in SciPy visible to Numba.
On Thu, Sep 19, 2019 at 5:41 AM Kevin Sheppard <kevin.k.shepp...@gmail.com> wrote: > > > On Thu, Sep 19, 2019 at 10:23 AM Ralf Gommers <ralf.gomm...@gmail.com> > wrote: > >> >> >> On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard < >> kevin.k.shepp...@gmail.com> wrote: >> >>> There are some users of the NumPy C code in randomkit. This was never >>> officially supported. There has been a long open issue to provide this >>> officially. >>> >>> When I wrote randomgen I supplied .pdx files that make it simpler to >>> write Cython code that uses the components. The lower-level API has not >>> had much scrutiny and is in need of a clean-up. I thought this would also >>> encourage users to extend the random machinery themselves as part of their >>> project or code so as to minimize the requests for new (exotic) >>> distributions to be included in Generator. >>> >>> Most of the generator functions follow a pattern random_DISTRIBUTION. >>> Some have a bit more name mangling which can easily be cleaned up (like >>> ranomd_gauss_zig, which should become PREFIX_standard_normal). >>> >>> Ralf Gommers suggested unprefixed names. >>> >> >> I suggested that the names should match the Python API, which I think >> isn't quite the same. The Python API doesn't contain things like "gamma", >> "t" or "f". >> > > My gamma and f (I misspoke about t) I mean the names that appear as > Generator methods: > > > https://docs.scipy.org/doc/numpy/reference/random/generator.html#numpy.random.Generator > > > If I understand your point (and with reference with page linked below), > then there would be something like numpy.random.cython_random.gamma (which > is currently called numpy.random.distributions.random_gamma). Maybe I'm not > understanding your point about the Python API though. > > >> >> I tried this in a local branch and it was a bit ugly since some of the >>> distributions have common math names (e.g., gamma) and others are very >>> short (e.g., t or f). I think a prefix is needed, and after looking >>> through the C API docs npy_random_ seemed like a reasonable choice (since >>> these live in numpy.random). >>> >>> Any thoughts on the following questions are welcome (others too): >>> >>> 1. Should there be a prefix on the C functions? >>> 2. If so, what should the prefix be? >>> >> >> Before worrying about naming details, can we start with "what should be >> in the C/Cython API"? If I look through the current pxd files, there's a >> lot there that looks like it should be private, and what we expose as >> Python API is not all present as far as I can tell (which may be fine, if >> the only goal is to let people write new generators rather than use the >> existing ones from Cython without the Python overhead). >> > > From the ground up, for someone who want to write a new distribution: > 1. The bit generators. These currently have no pxd files. These are > always going to be Python obects and so it isn't absolutely essential to > expose them with a low-level API. All that is needed is the capsule which > has the bitgen struct, which is what is really needed > 2. bitgen_t which is in common.pxd. This is essential since it enables > access to the callables to produce basic psueod random values. > 3. The distributions, which are in distributions.pdx. The integer > generators are in bounded_integers.pxd.in, which would need to be > processed and then included after processing (same for > bounded_integers.pxd.in). > a. The legacy in legacy_distributions.pxd. If the legacy is > included, then aug_bitgen_t needs to also be included which is also in > legacy_distributions.pxd > 4. The "helpers" which are defined in common.pxd. These simplify > implementing complete distributions which support automatix broadcasting > when needed. They are only provided to match the signatures for the > functions in distributions.pxd. The highest level ones are cont() and > disc(). Some of the lower-level ones could easily be marked as private. > > 1,2 and 3 are pretty important. 4 could be in or out. It could help if > someone wanted to write a fully featured distribution w/ broadcasting, but > I think this use case is less likely than someone say wanting to implement > a custom rejection sampler. > > > For someone who wants to write a new BitGenerator > > 1. BitGenerator and SeedSequence in bit_generato.pxd are required. As is > bitgen_t which is in common. bitgen_t should probably move to > bit_generators. > 2. aligned_malloc: This has been requested on multiple occasions and is > practically important when interfacing with SSE or AVX code. It is > potentially more general than the random module. This lives in common.pxd. > > > >> >> In the end we want to get to a doc section similar to >> http://scipy.github.io/devdocs/special.cython_special.html I'd think. >> >> 3. Should the legacy C functions be part of the API -- these are mostly >>> the ones that produce or depend on polar transform normals (Box-Muller). I >>> have a feeling no, but there may be reasons to prefer BM since they do not >>> depend on rejection sampling. >>> >> >> Even if there would be a couple of users interested, it would be odd >> starting to do this after deeming the code "legacy". So I agree with your >> "no". >> >> >>> 4. Should low-level API be consumable like any other numpy C API by >>> including the usual header locations and library locations? Right now, the >>> pxd simplifies writing Cython but users have sp specify the location of the >>> headers and source manually An alternative would be to provide a function >>> like np.get_include() -> np.random.get_include() that would specialize in >>> random. >>> >> >> Good question. I'm not sure this is "like any other NumPy C API". We >> don't provide a C API for fft, linalg or other functionality further from >> core either. It's possible of course, but does it really help library >> authors or end users? >> > > SciPy provides a very useful Cython API to low-level linalg. But there is > little reason to provide C APIs to fft or linalg since they are all > directly available. The code is random is AFAICT, one of the more complete > C implementations of functions needed to produce variates from many > distributions (mostly due to its ancestor randomkit, which AFAICT isn't > maintained). > > An ideal API would allow projects like > https://github.com/deepmind/torch-randomkit/tree/master/randomkit or > numba to consume the code in NumPy without vendoring it. > > Best wishes, > Kevin > > >> Cheers, >> Ralf >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion