Re: [Numpy-discussion] Addition of new distributions: Polya-gamma
Thanks for the insightful comments. I totally understand the concerns with maintenance. After some thought, I agree that maybe it would be best to have it as a separate package and hopefully get it added to scipy in the future? I will continue development at https://github.com/zoj613/polya-gamma for those interested in using the code or contributing. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] using scalar input on np.PyArray_MultiIterNew2
Hi all, I am looking for a way to use `np.PyArray_MultiIterNew2` in Cython to broadcast parameters of a function. The requirement is that the two arguments can be scalar and/or sequences. Using the usual `np.broadcast` function works well but is slow when iterating over the broadcasted input in a tight loop. I want to achieve the same using the C API. Currently, if I used `(np.PyArray_MultiIter_DATA(bcast, i))[0]` to iterate over the input when one of them is a scalar, I get no errors, but I notice the output of the parent function returns an array of zeros, which implies this approach didn't work. After investigating, it seems that np.PyArray_MultiIter_DATA only accepts numpy arrays. I could write a function to handle all combinations of scalar/array/list/tuple, and create temporary arrays to store the input data, but that seems daunting and error prone. Is there a way I can achieve this and have scalar arguments passed to np.PyArray_MultiIter_DATA be converted to same-element arrays without writing my own code from scratch? Any suggestions are welcome. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] using scalar input on np.PyArray_MultiIterNew2
Sebastian Berg wrote > On Sun, 2021-01-10 at 09:59 -0700, zoj613 wrote: >> Hi all, >> >> I am looking for a way to use `np.PyArray_MultiIterNew2` in Cython to >> broadcast parameters of a function. The requirement is that the two >> arguments can be scalar and/or sequences. Using the usual >> `np.broadcast` >> function works well but is slow when iterating over the broadcasted >> input in >> a tight loop. I want to achieve the same using the C API. > > NB: There is also a different, newer, iterator (`NpyIter_New`). > Depending on what you do, that might be a lot faster and better > (although, I admit it tends to be more verbose too). > That iterator also does broadcasting, and one nice property is that it > can allocate a result array. Most importantly, it casts for you and > allows you to take charge of the inner-loop (a one dimensional loop) > for performance. (This is the core of how ufuncs work.) > >> Currently, if I used `(<double*>np.PyArray_MultiIter_DATA(bcast, >> i))[0]` to >> iterate over the input when one of them is a scalar, >> I get no errors, but I notice the output of the parent function >> returns an >> array of zeros, which implies this approach didn't work. After >> investigating, it seems that np.PyArray_MultiIter_DATA only accepts >> numpy >> arrays. > > I am honestly confused by this (did you mean a different command?). > `PyArray_MultiIter_DATA` returns a pointer to the raw data, i.e. in > cython you would probably do: > > cdef double *value_ptr = > > npc.PyArray_MultiIter_DATA(iter, 0) > value_ptr[0] = 3.1416 > > Do want to reference the original data? You could reference the > original data for scalars (read-only since scalars are immutable), but > for lists/tuples there is no original data to begin with (unless you > have Python objects inside, but it would seem strange if NumPy to > attempted to transparently expose that). > >> I could write a function to handle all combinations of >> scalar/array/list/tuple, and create temporary arrays to store the >> input > > The typical pattern is to convert everything to array first, but > `PyArray_MultiIter()` does that for you. Unless you want to optimize > that conversion away? > > Cheers, > > Sebastian > > >> data, but that seems daunting and error prone. Is there a way I can >> achieve >> this and have scalar arguments passed to np.PyArray_MultiIter_DATA be >> converted to same-element arrays without writing my own code from >> scratch? >> >> Any suggestions are welcome. >> >> >> >> -- >> Sent from: http://numpy-discussion.10968.n7.nabble.com/ >> ___ >> NumPy-Discussion mailing list >> > NumPy-Discussion@ >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > I think I wasn't clear enough in my original question. Here is what I > have: > a python function with the signature: def pyfunc(a, b). This function > calls another cython function internally such that > def pyfunc(a, b): > > return cfunc(a, b) > > I want the pyfunc to support scalar and array_like input for both > parameters, so that internally I can have: > def pyfunc(a, b): > > # some part of the function > for i, j in some_broadcasted_iterator: > output[indx] = cfunc(i, j) > > Broadcasting using python level functions allows me to do this but then > the iteration part is slow because it uses python python code, meaning I > can't release the gil. What I was thinking of doing is using the C API to > create the broadcasted iterator using PyArray_MultiIterNew2 then iterating > over that. I kept getting a zero array as output when I tested it so I > assumed that the function doesnt work with scalars. It turns out the bug > was caused by me using integer scalars when the C function expects doubles > (But I fixed that after creating this post). > > Now I'm left with figuring out how I can efficiently convert even > lists/tuple inputs into a broadcasted iterator (without requiring the user > to pass numpy arrays). When I pass a list for parameter a and a scalar for > parameter b, the program hangs, so Im not sure if the > PyArray_MultiIterNew2 function does it for me? > > Btw, do you have a link to the more recent NpyIter_New? I couldn't find it > reading the C-API page in the docs. > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@ > https://mail.python.org/mailman/listinfo/numpy-discussion > > > signature.asc (849 bytes) > <http://numpy-discussion.10968.n7.nabble.com/attachment/48862/0/signature.asc>; -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] using scalar and tuple/list input on np.PyArray_MultiIterNew2
For what it's worth, I ended up defining a function like: cdef np.broadcast broadcast(object a, object b): """ Broadcast the inputs into a multiIterator object. the input can be a scalar, list, tuple or numpy array or array_like object. """ cdef bint is_a_seq = is_sequence(h) cdef bint is_a_seq = is_sequence(z) h = h if not is_a_seq else np.PyArray_FROM_OT(a, np.NPY_DOUBLE) z = z if not is_b_seq else np.PyArray_FROM_OT(b, np.NPY_DOUBLE) return np.PyArray_MultiIterNew2(a, b) It seems to do exactly what I want. Not sure if there are any potential bugs that could result from this approach. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] using scalar and tuple/list input on np.PyArray_MultiIterNew2
No, The function is what I ended up with, I just forgot to change (h, z) to (a, b). It works fine. Yes, I want to ensure that I end up with a double type even if an integer value is passed. I didn't think of unconditionally using np.PyArray_From_OT() even for scalars. I guess that is probably a better approach instead of the if statements because ultimately for that code block I want a broacasted array as a result, not scalars. I will experiment with the new multi-iterator interface and see if it is better than the current approach. Thanks! -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Addition of new distributions: Polya-gamma
As a follow-up to my last post here and comments from Robert Kern. I have just released version 1.0.0 of the sampler on PyPI. The work is pretty much done (I hope). Feedback would be very much appreciated. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Using logfactorial instead of loggamma in random_poisson sampler
Hi All, I noticed that the transformed rejection method for generating Poisson random variables used in numpy makes use of the `random_loggam` function which directly calculates the log-gamma function. It appears that a log-factorial lookup table was added a few years back which could be used in place of random_loggam since the input is always an integer. Is there a reason for not using this table instead? See link below for the line of code: https://github.com/numpy/numpy/blob/6222e283fa0b8fb9ba562dabf6ca9ea7ed65be39/numpy/random/src/distributions/distributions.c#L572 Regards Zolisa -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] guide for downstream package authors & setting version constraints
Thanks you, this looks very informative. Is there a best practice guide somewhere in the docs on how to correctly expose C-level code to third parties via .pxd files, similarly to how one can access the c_distributions of numpy via cython? I tried this previously and failed miserably. It seemed like symbols for some C functions I tried to expose to the user via cython declaration could not be found. I know I did something wrong, but im not sure what (I linked the header files and everything). The Cython docs were not very helpful. Maybe scipy/numpy devs could shed some light on how this is properly done? -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using logfactorial instead of loggamma in random_poisson sampler
Ah, I had a suspicion that it was to preserve the random stream but wasn't too sure. Thanks for the clarification. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] guide for downstream package authors & setting version constraints
Thanks for the suggestion. However I was able to solve the issue I had by just creating inline wrapper functions in cython for the C functions so I dont have to link them when importing in other 3rd party cython modules. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using logfactorial instead of loggamma in random_poisson sampler
What do you think is the explanation for that? I had assumed that using a lookup table would be faster considering that the loggam implementation has loops and makes calls to elementary functions in it. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to get Boolean matrix for similar lists in two different-size numpy arrays of lists
The following seems to produce what you want using the data provided ``` In [31]: dF = np.genfromtxt('/home/F.csv', delimiter=',').tolist() In [32]: dS = np.genfromtxt('/home/S.csv', delimiter=',').tolist() In [33]: r = [True if i in lS else False for i in dF] In [34]: sum(r) Out[34]: 300 ``` I hope this helps. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Improving performance of the `numpy.any` function.
Hi All, I was using numpy's `any` function earlier and realized that it might not be as performant as I assumed. See the code below: ``` In [1]: import numpy as np In [2]: a = np.zeros(1_000_000) In [3]: a[100] = 1 In [4]: b = np.zeros(2_000_000) In [5]: b[100] = 1 In [6]: %timeit np.any(a) 1.33 ms ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [7]: %timeit np.any(b) 2.65 ms ± 71.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [8] %timeit any(a) 13.4 µs ± 354 ns per loop (mean ± std. dev. of 7 runs, 10 loops each) In [9]: %timeit any(b) 13.3 µs ± 219 ns per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` It looks like `np.any` does not terminate early as soon as it finds a non-zero element, compared to the built-in any function. So this prompted me to go look at the source code for `PyArray_Any`: https://github.com/numpy/numpy/blob/623bc1fae1d47df24e7f1e29321d0c0ba2771ce0/numpy/core/src/multiarray/calculation.c#L790 . Taking a peek at the `PyArray_GenericReduceFunction` it looks like it just calls the `reduce` function from python on every single element of the array and there is no mechanism in place to short-circuit the call if an element has been found early. It seems worthwhile to have a specific function that allows early stopping of the reduction? So my question is: Is there a reason why maintainers did not implement such a feature? I would also like to know if everyone thinks its worthwhile to implement one? Im guessing there are many unsuspecting users like myself who call the `np.any` function on arrays assuming early-stopping is supported. I think existing code that calls the function would benefit a lot. Let me know what you think. Regards -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving performance of the `numpy.any` function.
Although still much slower than the builtin `any`, this is an interesting and strange alternative way to improve the timing. My speeds are a result of using a _very_ old machine with a low-grade processor so maybe these times are more exaggerated to me than others. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Inaccurate documentation of the random c-api
Hi All, https://numpy.org/devdocs/reference/random/c-api.html has an inaccurate description of the API that goes as follows: "/zig in the name are based on a ziggurat lookup algorithm is used instead of calculating the log, which is significantly faster. The non-ziggurat variants are used in corner cases and for legacy compatibility./". It appears this is no longer the case and instead there are functions that have `inv` in the name to signal the use of the inverse method for sampling. I submited a PR to reflect this and also added the missing function signatures at https://github.com/numpy/numpy/pull/18797/files Regards -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Checking if all array elements are whole numbers using the C-API
Hi All, Is there a C-API analogue for `np.asarray(a, dtype=int) != np.asarray(a)` assuming that `a` is an array or python sequence object of whole numbers but is assigned the float type (e.g a=array([1., 2., 3.]) or [1., 2., 3.])? I tried `np.PyArray_FROM_OT(a, np.NPY_LONG) != np.PyArray_FROM_O(a)` but I get the error message: `TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'`. Is there a way I can achieve what I want using the numpy array c-api? I want to check if an array/sequence has all elements as whole numbers so that the check wont fail is an element is "2.0" instead of "2". -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Checking if all array elements are whole numbers using the C-API
After reading the docs, it turns out that `np.PyArray_FROM_OTF(a, np.NPY_LONG, np.NPY_ARRAY_FORCECAST) != np.PyArray_FROM_O(a)` prevents the error from occurring. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion