Re: [Numpy-discussion] Addition of new distributions: Polya-gamma

2020-12-31 Thread zoj613
Thanks for the insightful comments. I totally understand the concerns with
maintenance. After some thought, I agree that maybe it would be best to have
it as a separate package and hopefully get it added to scipy in the future?
I will continue development at https://github.com/zoj613/polya-gamma for
those interested in using the code or contributing.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] using scalar input on np.PyArray_MultiIterNew2

2021-01-10 Thread zoj613
Hi all,

I am looking for a way to use `np.PyArray_MultiIterNew2` in Cython to
broadcast parameters of a function. The requirement is that the two
arguments can be scalar and/or sequences. Using the usual `np.broadcast`
function works well but is slow when iterating over the broadcasted input in
a tight loop. I want to achieve the same using the C API.

Currently, if I used `(np.PyArray_MultiIter_DATA(bcast, i))[0]` to
iterate over the input when one of them is a scalar,
I get no errors, but I notice the output of the parent function returns an
array of zeros, which implies this approach didn't work. After
investigating, it seems that np.PyArray_MultiIter_DATA only accepts numpy
arrays.

I could write a function to handle all combinations of
scalar/array/list/tuple, and create temporary arrays to store the input
data, but that seems daunting and error prone. Is there a way I can achieve
this and have scalar arguments passed to np.PyArray_MultiIter_DATA be
converted to same-element arrays without writing my own code from scratch?

Any suggestions are welcome.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] using scalar input on np.PyArray_MultiIterNew2

2021-01-10 Thread zoj613
 
Sebastian Berg wrote
> On Sun, 2021-01-10 at 09:59 -0700, zoj613 wrote:
>> Hi all,
>> 
>> I am looking for a way to use `np.PyArray_MultiIterNew2` in Cython to
>> broadcast parameters of a function. The requirement is that the two
>> arguments can be scalar and/or sequences. Using the usual
>> `np.broadcast`
>> function works well but is slow when iterating over the broadcasted
>> input in
>> a tight loop. I want to achieve the same using the C API.
> 
> NB: There is also a different, newer, iterator (`NpyIter_New`).
> Depending on what you do, that might be a lot faster and better
> (although, I admit it tends to be more verbose too).
> That iterator also does broadcasting, and one nice property is that it
> can allocate a result array.  Most importantly, it casts for you and
> allows you to take charge of the inner-loop (a one dimensional loop)
> for performance.  (This is the core of how ufuncs work.)
> 
>> Currently, if I used `(<double*>np.PyArray_MultiIter_DATA(bcast,
>> i))[0]` to
>> iterate over the input when one of them is a scalar,
>> I get no errors, but I notice the output of the parent function
>> returns an
>> array of zeros, which implies this approach didn't work. After
>> investigating, it seems that np.PyArray_MultiIter_DATA only accepts
>> numpy
>> arrays.
> 
> I am honestly confused by this (did you mean a different command?).
> `PyArray_MultiIter_DATA` returns a pointer to the raw data, i.e. in
> cython you would probably do:
> 
> cdef double *value_ptr = 
> 
> npc.PyArray_MultiIter_DATA(iter, 0)
> value_ptr[0] = 3.1416
> 
> Do want to reference the original data?  You could reference the
> original data for scalars (read-only since scalars are immutable), but
> for lists/tuples there is no original data to begin with (unless you
> have Python objects inside, but it would seem strange if NumPy to
> attempted to transparently expose that).
> 
>> I could write a function to handle all combinations of
>> scalar/array/list/tuple, and create temporary arrays to store the
>> input
> 
> The typical pattern is to convert everything to array first, but
> `PyArray_MultiIter()` does that for you. Unless you want to optimize
> that conversion away?
> 
> Cheers,
> 
> Sebastian
> 
> 
>> data, but that seems daunting and error prone. Is there a way I can
>> achieve
>> this and have scalar arguments passed to np.PyArray_MultiIter_DATA be
>> converted to same-element arrays without writing my own code from
>> scratch?
>> 
>> Any suggestions are welcome.
>> 
>> 
>> 
>> --
>> Sent from: http://numpy-discussion.10968.n7.nabble.com/
>> ___
>> NumPy-Discussion mailing list
>> 

> NumPy-Discussion@

>> https://mail.python.org/mailman/listinfo/numpy-discussion
>> 
> 
> I think I wasn't clear enough in my original question. Here is what I
> have:
> a python function with the signature: def pyfunc(a, b). This function
> calls another cython function internally such that
> def pyfunc(a, b):
> 
> return cfunc(a, b)
> 
> I want the pyfunc to support scalar and array_like input for both
> parameters, so that internally I can have:
> def pyfunc(a, b):
> 
> # some part of the function
> for i, j in some_broadcasted_iterator:
> output[indx] = cfunc(i, j)
> 
> Broadcasting using python level functions allows me to do this but then
> the iteration part is slow because it uses python python code, meaning I
> can't release the gil. What I was thinking of doing is using the C API to
> create the broadcasted iterator using PyArray_MultiIterNew2 then iterating
> over that. I kept getting a zero array as output when I tested it so I
> assumed that the function doesnt work with scalars. It turns out the bug
> was caused by me using integer scalars when the C function expects doubles
> (But I fixed that after creating this post).
> 
> Now I'm left with figuring out how I can efficiently convert even
> lists/tuple inputs into a broadcasted iterator (without requiring the user
> to pass numpy arrays). When I pass a list for parameter a and a scalar for
> parameter b, the program hangs, so Im not sure if the
> PyArray_MultiIterNew2 function does it for me?
> 
> Btw, do you have a link to the more recent NpyIter_New? I couldn't find it
> reading the C-API page in the docs.
> 
> 
> 
> ___
> NumPy-Discussion mailing list

> NumPy-Discussion@

> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> signature.asc (849 bytes)
> <http://numpy-discussion.10968.n7.nabble.com/attachment/48862/0/signature.asc>;





--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] using scalar and tuple/list input on np.PyArray_MultiIterNew2

2021-01-10 Thread zoj613
For what it's worth, I ended up defining a function like:
cdef np.broadcast broadcast(object a, object b):
"""
Broadcast the inputs into a multiIterator object.
the input can be a scalar, list, tuple or numpy array or array_like
object.
"""
cdef bint is_a_seq = is_sequence(h)
cdef bint is_a_seq = is_sequence(z)

h = h if not is_a_seq else np.PyArray_FROM_OT(a, np.NPY_DOUBLE)
z = z if not is_b_seq else np.PyArray_FROM_OT(b, np.NPY_DOUBLE)

return np.PyArray_MultiIterNew2(a, b)

It seems to do exactly what I want. Not sure if there are any potential bugs
that could result from this approach.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] using scalar and tuple/list input on np.PyArray_MultiIterNew2

2021-01-11 Thread zoj613
No, The function is what I ended up with, I just forgot to change (h, z) to
(a, b). It works fine. Yes, I want to ensure that I end up with a double
type even if an integer value is passed. I didn't think of unconditionally
using np.PyArray_From_OT() even for scalars. I guess that is probably a
better approach instead of the if statements because ultimately for that
code block I want a broacasted array as a result, not scalars.  I will
experiment with the new multi-iterator interface and see if it is better
than the current approach. Thanks!



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Addition of new distributions: Polya-gamma

2021-01-27 Thread zoj613
As a follow-up to my last post here and comments from Robert Kern. I have
just released version 1.0.0 of the sampler on PyPI. The work is pretty much
done (I hope). Feedback would be very much appreciated.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Using logfactorial instead of loggamma in random_poisson sampler

2021-03-06 Thread zoj613
Hi All,

I noticed that the transformed rejection method for generating Poisson
random variables used in numpy makes use of the `random_loggam` function
which directly calculates the log-gamma function. It appears that a
log-factorial lookup table was added a few years back which could be used in
place of random_loggam since the input is always an integer. Is there a
reason for not using this table instead? See link below for the line of
code:

https://github.com/numpy/numpy/blob/6222e283fa0b8fb9ba562dabf6ca9ea7ed65be39/numpy/random/src/distributions/distributions.c#L572

Regards
Zolisa



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] guide for downstream package authors & setting version constraints

2021-03-06 Thread zoj613
Thanks you, this looks very informative. Is there a best practice guide
somewhere in the docs on how to correctly expose C-level code to third
parties via .pxd files, similarly to how one can access the c_distributions
of numpy via cython? I tried this previously and failed miserably. It seemed
like symbols for some C functions I tried to expose to the user via
cython declaration could not be found. I know I did something wrong, but im
not sure what (I linked the header files and everything). The Cython docs
were not very helpful. Maybe scipy/numpy devs could shed some light on how
this is properly done?



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using logfactorial instead of loggamma in random_poisson sampler

2021-03-06 Thread zoj613
Ah, I had a suspicion that it was to preserve the random stream but wasn't
too sure. Thanks for the clarification.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] guide for downstream package authors & setting version constraints

2021-03-08 Thread zoj613
Thanks for the suggestion. However I was able to solve the issue I had by
just creating inline wrapper functions in cython for the C functions so I
dont have to link them when importing in other 3rd party cython modules.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using logfactorial instead of loggamma in random_poisson sampler

2021-03-08 Thread zoj613
What do you think is the explanation for that? I had assumed that using a
lookup table would be faster considering that the loggam implementation has
loops and makes calls to elementary functions in it.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to get Boolean matrix for similar lists in two different-size numpy arrays of lists

2021-03-14 Thread zoj613
The following seems to produce what you want using the data provided

```
In [31]: dF = np.genfromtxt('/home/F.csv', delimiter=',').tolist()

In [32]: dS = np.genfromtxt('/home/S.csv', delimiter=',').tolist()

In [33]: r =  [True if i in lS else False for i in dF]

In [34]: sum(r)

Out[34]: 300
```

I hope this helps.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Improving performance of the `numpy.any` function.

2021-04-14 Thread zoj613
Hi All,

I was using numpy's `any` function earlier and realized that it might not be
as performant as I assumed. See the code below:
```
In [1]: import numpy as np

In [2]: a = np.zeros(1_000_000)

In [3]: a[100] = 1

In [4]: b = np.zeros(2_000_000)

In [5]: b[100] = 1

In [6]: %timeit np.any(a)
1.33 ms ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [7]: %timeit np.any(b)
2.65 ms ± 71.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [8] %timeit any(a)
13.4 µs ± 354 ns per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: %timeit any(b)
13.3 µs ± 219 ns per loop (mean ± std. dev. of 7 runs, 10 loops each)
```
It looks like `np.any` does not terminate early as soon as it finds a
non-zero element, compared to the built-in any function. So this prompted me
to go look at the source code for `PyArray_Any`:
https://github.com/numpy/numpy/blob/623bc1fae1d47df24e7f1e29321d0c0ba2771ce0/numpy/core/src/multiarray/calculation.c#L790
.

Taking a peek at the `PyArray_GenericReduceFunction` it looks like it just
calls the `reduce` function from python on every single element of the array
and there is no mechanism in place to short-circuit the call if an element
has been found early.

It seems worthwhile to have a specific function that allows early stopping
of the reduction? So my question is: Is there a reason why maintainers did
not implement such a feature? I would also like to know if everyone thinks
its worthwhile to implement one? Im guessing there are many unsuspecting
users like myself who call the `np.any` function on arrays assuming
early-stopping is supported. I think existing code that calls the function
would benefit a lot. Let me know what you think.

Regards



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving performance of the `numpy.any` function.

2021-04-15 Thread zoj613
Although still much slower than the builtin `any`, this is an interesting and
strange alternative way to improve the timing. My speeds are a result of
using a _very_ old machine with a low-grade processor so maybe these times
are more exaggerated to me than others.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Inaccurate documentation of the random c-api

2021-04-18 Thread zoj613
Hi All,

https://numpy.org/devdocs/reference/random/c-api.html has an inaccurate
description of the API that goes as follows:
"/zig in the name are based on a ziggurat lookup algorithm is used instead
of calculating the log, which is significantly faster. The non-ziggurat
variants are used in corner cases and for legacy compatibility./". It
appears this is no longer the case and instead there are functions that have
`inv` in the name to signal the use of the inverse method for sampling. I
submited a PR to reflect this and also added the missing function signatures
at https://github.com/numpy/numpy/pull/18797/files

Regards




--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Checking if all array elements are whole numbers using the C-API

2021-06-12 Thread zoj613
Hi All,

Is there a C-API analogue for `np.asarray(a, dtype=int) != np.asarray(a)`
assuming that `a` is an array or python sequence object of whole numbers but
is assigned the float type (e.g a=array([1., 2., 3.]) or [1., 2., 3.])? I
tried `np.PyArray_FROM_OT(a, np.NPY_LONG) != np.PyArray_FROM_O(a)` but I get
the error message:

`TypeError: Cannot cast array data from dtype('float64') to dtype('int64')
according to the rule 'safe'`.

Is there a way I can achieve what I want using the numpy array c-api? I want
to check if an array/sequence has all elements as whole numbers so that the
check wont fail is an element is "2.0" instead of "2".




--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Checking if all array elements are whole numbers using the C-API

2021-06-13 Thread zoj613
After reading the docs, it turns out that  `np.PyArray_FROM_OTF(a,
np.NPY_LONG, np.NPY_ARRAY_FORCECAST) != np.PyArray_FROM_O(a)` prevents the
error from occurring.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion