[Numpy-discussion] Re: Arbitrarily large random integers

2023-08-19 Thread Kevin Sheppard
The easiest way to do this would to to write a pure python implementation
using Python ints of a masked integer sampler.  This way you could draw
unsigned integers and then treat this as a bit pool.  You would than take
the number of bits needed for your integer, transform these to be a Python
int, and finally apply the mask.

This is how integers are generated in the legacy Random state code.

Kevin


On Sat, Aug 19, 2023, 15:43 Dan Schult  wrote:

> How can we use numpy's random `integers` function to get uniformly
> selected integers from an arbitrarily large `high` limit? This is important
> when dealing with exact probabilities in combinatorially large solution
> spaces.
>
> I propose that we add the capability for `integers` to construct arrays of
> type object_ by having it construct python int's as the objects in the
> returned array. This would allow arbitrarily large integers.
>
> The Python random library's `randrange` constructs values for arbitrary
> upper limits -- and they are exact when using subclasses of `random.Random`
> with a `getrandbits` methods (which includes the default rng for most
> operating systems).
>
> Numpy's random `integers` function rightfully raises on `integers(20**20,
> dtype=int64)` because the upper limit is above what can be held in an
> `int64`. But Python `int` objects store arbitrarily large integers. So I
> would expect `integers(20**20, dtype=object)` to create random integers on
> the desired range. Instead a TypeError is raised `Unsupported dtype
> dtype('O') for integers`. It seems we could provide support for dtype('O')
> by constructing Python `int` values and this would allow arbitrarily large
> ranges of integers.
>
> The core of this functionality would be close to the seven lines used in
> [the code of random.Random._randbelow](
> https://github.com/python/cpython/blob/eb953d6e4484339067837020f77eecac61f8d4f8/Lib/random.py#L242)
> which
> 1) finds the number of bits needed to describe the `high` argument.
> 2) generates that number of random bits.
> 3) converts them to a python int and checks if it is larger than the input
> `high`. If so, repeat from step 2.
>
> I realize that people can just use `random.randrange` to obtain this
> functionality, but that doesn't return an array, and uses a different RNG
> possibly requiring tracking two RNG states.
>
> This text was also used to create [Issue #24458](
> https://github.com/numpy/numpy/issues/24458)
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: kevin.k.shepp...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: reminder: put an upper bound on setuptools if you use numpy.distutils!

2023-08-31 Thread Kevin Sheppard
On Sun, Aug 21, 2022 at 6:36 PM Ralf Gommers  wrote:

> Hi all,
>
> After setuptools 65.0 was released a few days ago, all users of
> numpy.distutils had their builds broken. This is already fixed in
> setuptools 65.0.2 because the breakage was particularly bad. However, the
> next breakage may not be fixed anymore (and more breakages *are* expected).
> So this is a good time to remind you all that you should put an upper bound
> on the setuptools version you allow in the releases of your package - to
> the last version that is known to work with your package.
>
> Our official stance here is that setuptools versions >=60 are not
> supported - see the "deprecating numpy.distutils" thread:
> https://mail.python.org/archives/list/numpy-discussion@python.org/message/PMU4P4YRP2FZA2Z6Z6Z74ZFYD6PCRXQ5/.
> Newer versions may work for you, depending on what features you use. They
> don't for NumPy and for SciPy; both projects pin to 59.X to avoid problems.
>
> For the recent issue with setuptools 65.0.0, see
> https://github.com/numpy/numpy/issues/22135. We have also made the
> warnings about this topic in our docs more explicit, see
> https://github.com/numpy/numpy/pull/22154.
>
> Cheers,
> Ralf
>
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: kevin.k.shepp...@gmail.com


Is there a good way to pin <2 this when using oldest-supported-numpy?

-Kevin
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: NEP 55 - Add a UTF-8 Variable-Width String DType to NumPy

2023-09-20 Thread Kevin Sheppard
On Wed, Sep 20, 2023 at 11:23 AM Ralf Gommers 
wrote:

>
>
> On Wed, Sep 20, 2023 at 8:26 AM Warren Weckesser <
> warren.weckes...@gmail.com> wrote:
>
>>
>>
>> On Fri, Sep 15, 2023 at 3:18 PM Warren Weckesser <
>> warren.weckes...@gmail.com> wrote:
>> >
>> >
>> >
>> > On Mon, Sep 11, 2023 at 12:25 PM Nathan 
>> wrote:
>> >>
>> >>
>> >>
>> >> On Sun, Sep 3, 2023 at 10:54 AM Warren Weckesser <
>> warren.weckes...@gmail.com> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Aug 29, 2023 at 10:09 AM Nathan 
>> wrote:
>> >>> >
>> >>> > The NEP was merged in draft form, see below.
>> >>> >
>> >>> > https://numpy.org/neps/nep-0055-string_dtype.html
>> >>> >
>> >>> > On Mon, Aug 21, 2023 at 2:36 PM Nathan 
>> wrote:
>> >>> >>
>> >>> >> Hello all,
>> >>> >>
>> >>> >> I just opened a pull request to add NEP 55, see
>> https://github.com/numpy/numpy/pull/24483.
>> >>> >>
>> >>> >> Per NEP 0, I've copied everything up to the "detailed description"
>> section below.
>> >>> >>
>> >>> >> I'm looking forward to your feedback on this.
>> >>> >>
>> >>> >> -Nathan Goldbaum
>> >>> >>
>> >>>
>> >>> This will be a nice addition to NumPy, and matches a suggestion by
>> >>> @rkern (and probably others) made in the 2017 mailing list thread;
>> >>> see the last bullet of
>> >>>
>> >>>
>> https://mail.python.org/pipermail/numpy-discussion/2017-April/076681.html
>> >>>
>> >>> So +1 for the enhancement!
>> >>>
>> >>> Now for some nitty-gritty review...
>> >>
>> >>
>> >> Thanks for the nitty-gritty review! I was on vacation last week and
>> haven't had a chance to look over this in detail yet, but at first glance
>> this seems like a really nice improvement.
>> >>
>> >> I'm going to try to integrate your proposed design into the dtype
>> prototype this week. If that works, I'd like to include some of the text
>> from the README in your repo in the NEP and add you as an author, would
>> that be alright?
>> >
>> >
>> >
>> > Sure, that would be fine.
>> >
>> > I have a few more comments and questions about the NEP that I'll finish
>> up and send this weekend.
>> >
>>
>> One more comment on the NEP...
>>
>> My first impression of the missing data API design is that
>> it is more complicated than necessary. An alternative that
>> is simpler--and is consistent with the pattern established for
>> floats and datetimes--is to define a "not a string" value, say
>> `np.nastring` or something similar, just like we have `nan` for
>> floats and `nat` for datetimes. Its behavior could be what
>> you called "nan-like".
>>
>
> Float `np.nan` and datetime missing value sentinel are not all that
> similar, and the latter was always a bit questionable (at least partially
> it's a left-over of trying to introduce generic missing value support I
> believe). `nan` is a float and part of C/C++ standards with well-defined
> numerical behavior. In contrast, there is no `np.nat`; you can retrieve a
> sentinel value with `np.datetime64("NaT")` only. I'm not sure if it's
> possible to generate a NaT value with a regular operation on a datetime
> array a la `np.array([1.5]) / 0.0`.
>
> The handling of `np.nastring` would be an intrinsic part of the
>> dtype, so there would be no need for the `na_object` parameter
>> of `StringDType`. All `StringDType`s would handle `np.nastring`
>> in the same consistent manner.
>>
>> The use-case for the string sentinel does not seem very
>> compelling (but maybe I just don't understand the use-cases).
>> If there is a real need here that is not covered by
>> `np.nastring`, perhaps just a flag to control the repr of
>> `np.nastring` for each StringDType instance would be enough?
>>
>
> My understanding is that the NEP provides the necessary but limited
> support to allow Pandas to adopt the new dtype. The scope section of the
> NEP says: "Fully agreeing on the semantics of a missing data sentinels or
> adding a missing data sentinel to NumPy itself.". And then further down:
> "By only supporting user-provided missing data sentinels, we avoid
> resolving exactly how NumPy itself should support missing data and the
> correct semantics of the missing data object, leaving that up to users to
> decide"
>
> That general approach I agree with, it's a large can of worms and not the
> main purpose of this NEP. Nathan may have more thoughts about what, if
> anything, from your suggestions could be adopted, but the general "let's
> introduce a missing value thing" is a path we should not go down here imho.
>
>
>
>>
>> If there is an objection to a potential proliferation of
>> "not a thing" special values, one for each type that can
>> handle them, then perhaps a generic "not a value" (say
>> `np.navalue`) could be created that, when assigned to an
>> element of an array, results in the appropriate "not a thing"
>> value actually being assigned. In a sense, I guess this NEP is
>> proposing that, but it is reusing the floating point object
>> `np.nan` as the generic "not a thing" value
>>
>
> It is explicitly not using `np.nan` but instead allowing 

[Numpy-discussion] Re: Switching default order to column-major

2023-11-12 Thread Kevin Sheppard
I think you can always using order="F" in your own code.

If you patched NumPy and then the downstream libraries had to use your
customized NumPy I think you would see some breaks.  Probably not a lot,
since many use the python numpy API which handles C or F well.  Some code
does do things like call array.flat after creating an array with default
arguments from a list or using "copy=True" and then expects the data to be
ordered as if order="C".

Kevin


On Sat, Nov 11, 2023 at 3:03 PM Valerio De Benedetto 
wrote:

> Hi, I found that the documented default row-major order is enforced
> throughout the library with a series of `order='C'` default parameters, so
> given this I supposed there's no way to change the default (or am I wrong?)
> If, supposedly, I'd change that by patching the library (substituting 'C's
> for 'F's), do you think there would by any problem with other downstream
> libraries using numpy in my project? Do you think they assume a
> default-constructed array is always row-major and access the underlying
> data?
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: kevin.k.shepp...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Arrays of variable itemsize

2024-03-13 Thread Kevin Sheppard
Does the new DType system in NumPy 2 make something like this more
possible?  I would suspect that the user would have to write a lot of code
to have reasonable performance if it was.

Kevin


On Wed, Mar 13, 2024 at 3:55 PM Nathan  wrote:

> Yes, an array of references still has a fixed size width in the array
> buffer. You can think of each entry in the array as a pointer to some other
> memory on the heap, which can be a dynamic memory allocation.
>
> There's no way in NumPy to support variable-sized array elements in the
> array buffer, since that assumption is key to how numpy implements strided
> ufuncs and broadcasting.,
>
> On Wed, Mar 13, 2024 at 9:34 AM Dom Grigonis 
> wrote:
>
>> Thank you for this.
>>
>> I am just starting to think about these things, so I appreciate your
>> patience.
>>
>> But isn’t it still true that all elements of an array are still of the
>> same size in memory?
>>
>> I am thinking along the lines of per-element dynamic memory management.
>> Such that if I had array [0, 1e1], the first element would default to
>> reasonably small size in memory.
>>
>> On 13 Mar 2024, at 16:29, Nathan  wrote:
>>
>> It is possible to do this using the new DType system.
>>
>> Sebastian wrote a sketch for a DType backed by the GNU multiprecision
>> float library:
>> https://github.com/numpy/numpy-user-dtypes/tree/main/mpfdtype
>>
>> It adds a significant amount of complexity to store data outside the
>> array buffer and introduces the possibility of use-after-free and dangling
>> reference errors that are impossible if the array does not use embedded
>> references, so that’s the main reason it hasn’t been done much.
>>
>> On Wed, Mar 13, 2024 at 8:17 AM Dom Grigonis 
>> wrote:
>>
>>> Hi all,
>>>
>>> Say python’s builtin `int` type. It can be as large as memory allows.
>>>
>>> np.ndarray on the other hand is optimized for vectorization via strides,
>>> memory structure and many things that I probably don’t know. Well the point
>>> is that it is convenient and efficient to use for many things in comparison
>>> to python’s built-in list of integers.
>>>
>>> So, I am thinking whether something in between exists? (And obviously
>>> something more clever than np.array(dtype=object))
>>>
>>> Probably something similar to `StringDType`, but for integers and
>>> floats. (It’s just my guess. I don’t know anything about `StringDType`,
>>> but just guessing it must be better than np.array(dtype=object) in
>>> combination with np.vectorize)
>>>
>>> Regards,
>>> dgpb
>>>
>>> ___
>>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>>> To unsubscribe send an email to numpy-discussion-le...@python.org
>>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>>> Member address: nathan12...@gmail.com
>>>
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: dom.grigo...@gmail.com
>>
>>
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: nathan12...@gmail.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: kevin.k.shepp...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] ENH: Add complex random normal generator (PR #9561)

2017-09-08 Thread Kevin Sheppard
I would like to add a complex random normal generator to
mtrand/RandomState.

A scalar complex normal is a (double) bivariate normal.  The main
motivation is to simplify the construction of complex normals where are
generally parameterized in terms of three values: location, covariance and
relation.  location is the same as in a standard normal.  The covariance
and the relation jointly determine the variance of the real and imaginary
parts as well as the covariance between the two.

#1 The initial implementation in the PR has followed the standard template
for scalar RV generators with three paths, scalar->scalar, scalar->array
and array->array.  It is bulky since the existing array fillers that handle
the scalar->array and array->array for double rvs cannot be used.  It
supports broadcasting and has a similar API to other scalar RV generators
(e.g. normal).

#2 The PR discussion has moved towards exploiting the relationship with the
multivariate normal. Currently the MV normal doesn't broadcast, and so
following this path would only allow the scalar->scalar and the
scalar->array paths.  This could theoretically be extended to allow
broadcasting if multivariate_normal was extended to allow broadcasting.

#3 If broadcasting is off the table, then it might make more sense to skip
a scalar complex normal and just move directly to a
multivariate_complex_normal since this is also just a higher dimension
(double) multivariate normal. This function could just wrap
multivariate_normal and would be relatively straightforward forward.  The
only down side of this path is that it would not easily support  a
scalar->scalar path, although this could be added.

This probably isn't much of a performance hit for following #2 or #3.  I
checked how normal and multivariate normal perform for large draws:

%timeit np.random.normal(2.0,4.0,size=100)
30.8 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit np.random.multivariate_normal([2.0],[[4.0]],size=100)
32.2 ms ± 308 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

For smaller draws the performance difference is larger:

%timeit np.random.normal(2.0,4.0,size=10)
2.95 µs ± 16.5 ns per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit np.random.multivariate_normal([2.0],[[4.0]],size=10)
49.4 µs ± 249 ns per loop (mean ± std. dev. of 7 runs, 1 loops each)

And for scalars the scalar path is about 30x faster than the multivariate
path.  It is also worth noting that multivariate_normal will only return a
vector even if the inputs only generate a single scalar.

%timeit np.random.normal(2.0,4.0)
1.42 µs ± 3.05 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit np.random.multivariate_normal([2.0],[[4.0]])
47.9 µs ± 167 ns per loop (mean ± std. dev. of 7 runs, 1 loops each)

It would be helpful do determine which path,

#1 Clone standard scalar generator with 3 paths including broadcasting
#2 Scalar generator using multivariate normal, excluding broadcasting
#3 Multivariate generator using multivariate normal, excluding broadcasting

is the preferred one.

Kevin
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving NumPy's PRNG Forward

2018-01-26 Thread Kevin Sheppard
I am a firm believer that the current situation is not sustainable.  There
are a lot of improvements that can practically be incorporated.  While many
of these are performance related, there are also improvements in accuracy
over some ranges of parameters that cannot be incorporated. I also think
that perfect stream reproducibility is a bit of a myth across versions
since this really would require identical OS, compiler and possibly CPU for
some of the generators that produce floats.

I believe there is a case for separating the random generator from core
NumPy.  Some points that favor becoming a subproject:

1. It is a pure consumer of NumPy API.  Other parts of the API do no depend
on random.
2. A stand alone package could be installed along side many different
version of core NumPy which would reduce the pressure on freezing the
stream.

In terms of what is needed, I think that the underlying PRNG should be
swappable.  The will provide a simple mechanism to allow certain types of
advancement while easily providing backward compat.  In the current design
this is very hard and requires compiling many nearly identical copies of
RandomState. In pseudocode something like

standard_normal(prng)

where prng is a basic class that retains the PRNG state and has a small set
of core random number generators that belong to the underlying PRNG --
probably something like int32, int64, double, and possibly int53. I am not
advocating explicitly passing the PRNG as an argument, but having
generators which can take any suitable PRNG would add a lot of flexibility
in terms of taking advantage of improvements in the underlying PRNGs (see,
e.g., xoroshiro128/xorshift1024).  The "small" core PRNG would have
responsibility over state and streams.  The remainder of the module would
transform the underlying PRNG into the required distributions.

This would also simplify making improvements, since old versions could be
saved or improved versions could be added to the API.  For example,

from numpy.random import standard_normal, prng # Preferred versions
standard_normal(prng)  # Ziggurat
from numpy.random.legacy import standard_normal_bm, mt19937 # legacy
generators
standard_normal_bm(mt19937) # Box-Muller

Kevin
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving NumPy's PRNG Forward

2018-01-29 Thread Kevin Sheppard
I agree with pretty much everything you wrote Robert.  I didn't have quote
the right frame but the generic class that takes a low-level core PRNG
sounds like the right design, and this should make user-generated
distributions easier to develop.  I was thinking along these lines inspired
by the SpiPy changes that use a LowLevelCallable, e.g.,

https://ilovesymposia.com/2017/03/12/scipys-new-lowlevelcallable-is-a-game-changer/


This might also allow users to extend the core PRNGs using something like
Numba JIT classes as an alternative.

Another area that needs though is how to correctly spawn in Multiprocess
application. This might be most easily addressed by providing a guide on a
good way rather than the arbitrary way used now.

Kevin





On Sat, Jan 27, 2018 at 5:03 PM  wrote:

> Send NumPy-Discussion mailing list submissions to
> numpy-discussion@python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.python.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
> numpy-discussion-requ...@python.org
>
> You can reach the person managing the list at
> numpy-discussion-ow...@python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
>
>
> Today's Topics:
>
>1. Re: Moving NumPy's PRNG Forward (Robert Kern)
>2. Re: Using np.frombuffer and cffi.buffer on array of C structs
>   (problem with struct member padding) (Joe)
>
>
> --
>
> Message: 1
> Date: Sat, 27 Jan 2018 09:28:54 +0900
> From: Robert Kern 
> To: Discussion of Numerical Python 
> Subject: Re: [Numpy-discussion] Moving NumPy's PRNG Forward
> Message-ID:
>  w...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Sat, Jan 27, 2018 at 1:14 AM, Kevin Sheppard <
> kevin.k.shepp...@gmail.com>
> wrote:
> >
> > I am a firm believer that the current situation is not sustainable.
> There are a lot of improvements that can practically be incorporated.
> While many of these are performance related, there are also improvements in
> accuracy over some ranges of parameters that cannot be incorporated. I also
> think that perfect stream reproducibility is a bit of a myth across
> versions since this really would require identical OS, compiler and
> possibly CPU for some of the generators that produce floats.
> >
> > I believe there is a case for separating the random generator from core
> NumPy.  Some points that favor becoming a subproject:
> >
> > 1. It is a pure consumer of NumPy API.  Other parts of the API do no
> depend on random.
> > 2. A stand alone package could be installed along side many different
> version of core NumPy which would reduce the pressure on freezing the
> stream.
>
> Removing numpy.random (or freezing it as deprecated legacy while all PRNG
> development moves elsewhere) is probably a non-starter. It's too used for
> us not to provide something. That said, we can (and ought to) make it much
> easier for external packages to provide PRNG capabilities (core PRNGs and
> distributions) that interoperate with the core functionality that numpy
> provides. I'm also happy to place a high barrier on adding more
> distributions to numpy.random once that is in place.
>
> Specifically, core uniform PRNGs should have a small common C API that
> distribution functions can use. This might just be a struct with an opaque
> `void*` state pointer and then 2 function pointers for drawing a uint64
> (whole range) and a double in [0,1) from the state. It's important to
> expose our core uniform PRNGs as a C API because there has been a desire to
> interoperate at that level, using the same PRNG state inside C or Fortran
> or GPU code. If that's in place, then people can write new efficient
> distribution functions in C that use this small C API agnostic to the core
> PRNG algorithm. It also makes it easy to implement new core PRNGs that the
> distribution functions provided by numpy.random can use.
>
> > In terms of what is needed, I think that the underlying PRNG should be
> swappable.  The will provide a simple mechanism to allow certain types of
> advancement while easily providing backward compat.  In the current design
> this is very hard and requires compiling many nearly identical copies of
> RandomState. In pseudocode something like
> >
> > standard_normal(prng)
> >
> > where prng is a basic class that retains the PRNG state and has a small
> set of core random number generators that belong to the underly

[Numpy-discussion] Prototype CorePRNG implementation feedback

2018-02-22 Thread Kevin Sheppard
I have implemented a working prototype of a pluggable RandomState.  The
repo is located at https://github.com/bashtage/core-prng.

The main design has a small PRNG (currently only SplitMix64 and
Xoroshiro128 are implemented) that does not have any public functions
available to generate random numbers.  These objects only expose methods to
get and set state and to jump the PRNG.

The random number generator is exposed via an opaque structure wrapped in a
PyCapsule.  This structure has two void pointers, one to the state and one
to the function that generates the next uint 64.  I think in the long-run
it might make sense to expose a few additional interfaces, such as the next
random double (makes sense for dSFMT) or the next random uint32 (makes
sense for MT19937).  For now, I have deliberately kept it simple.

The user-facing object is called `RandomGenerator` and it can be
initialized using the syntax `RandomGenerator` or
`RandomGenerator(Xoroshiro128())`.  The object exposes user-facing methods
(currently only `random_integer` and `random_double`).

A basic demo:

from core_prng.generator import RandomGenerator
# Splitmix 64 for now
RandomGenerator().random_integer()

from core_prng.xoroshiro128 import Xoroshiro128
RandomGenerator(Xoroshiro128()).random_integer()

A few questions that have come up:

   1. Should a passed core PRNG be initialized or not. The syntax
RandomGenerator(Xoroshiro128)
   looks nicer than RandomGenerator(Xoroshiro128())
   2. Which low-level methods should be part of the core PRNG?  These would
   all be scalar generators that would be consumed by RandomGenerator and
   exposed through void *. I could imagine some subset of the following:
  - uint64
  - uint32
  - double
  - float
  - uint53 (applicable to PRNGs that use doubles as default type)
   3. Since RandomState is taken, a new name is needed for the user-facing
   object.  RandomGenerator is a placeholder but ideally, something meaningful
   could be chosen.
   4. I can't see how locking can be usefully implemented in the core PRNGs
   without a large penalty. This means that these are not threadsafe.  This
   isn't a massive problem but they do access the state (get or set) although
   with GIL.
   5. Should state be a property -- this isn't Java...


Other feedback is of course also welcome.

Kevin
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Prototype CorePRNG implementation feedback (Neal Becker)

2018-02-22 Thread Kevin Sheppard
>
>
> What is the syntax to construct an initialized generator?
> RandomGenerator(Xoroshiro128(my_seed))?
>
>
Not 100% certain on this.  There was talk in the earlier thread that seed
should be killed, although I wasn't clear what mathod would be
preferrable to get easily reproducible random numbers.

Another option would be

RandomGenerator(Xoroshiro128, seed=my_seed)

Kevin
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Random Number Generator Policy

2018-06-03 Thread Kevin Sheppard
The seed() discussion seems unnecessary.  StableRandom will need to have a 
method to set/get state
which can be used by any project that needs to get reproducible numbers from 
the module-level generator.

While this is an implementation detail, many generators have much smaller 
states than MT19937 
(a few uint64s). So this is easy enough to hard code where needed.


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Random Number Generator Policy

2018-06-04 Thread Kevin Sheppard
I’m not sure if this is within the scope of the NEP or an implementation 
detail, but I think a new PRNG should use platform independent integer types 
rather than depending on the platform’s choice of 64-bit data model.  This 
should be enough to ensure that any integer distribution that only uses 
integers internally should produce identical results across uarch/OS.


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Random Number Generator Policy (Robert Kern)

2018-06-11 Thread Kevin Sheppard
Maybe a good place for a stable, testing focused generator would be in 
numpy.random.testing.  This could host a default implementation of 
StableGenerator, although a better name might be TestingGenerator.  It would 
also help users decide that this is not the generator they are looking for (I 
think many people might think StableGenerator is a good thing, after all, who 
wants an UnstableGenerator).


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] NEP 21: Simplified and explicit advanced indexing

2018-06-26 Thread Kevin Sheppard
This seems like progress and a clear method to outer indexing will help many 
users.

As for names, I prefer .ox and .vx for shorthand of .oindex and .vindex.  I 
don’t like the .ox_ or .o_ syntax.

Before any deprecation warnings or any other warnings are added it would be 
helpful to have some way to set a flag on Python to show some sort of 
HiddenDeprecationWarning (or OnlyShowIfFlagPassesDeprecationWarning) that would 
automatically be filtered by default but could be shown if someone was 
interested.  This will allow library writers to see problems before any start 
showing up for users. These could then be promoted to Visible or Future later.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Complex Normal Generator in NEP-19 extension

2019-03-29 Thread Kevin Sheppard
One part of moving randomgen closer to fulfilling NEP-19 is rationalizing
the API, especially new features not in RandomState. Matti Picus has made a
lot of progress in getting it integrated, especially the part of replacing
RandomState shimed version of the new generator.

There is only one new method in the generator, a scalar generator for
complex normals. It is scalar in the sense that it is the complex version
of np.random.normal, and so supports broadcasting.

This was written based on some GH comments.  This would be a new API and so
it needs to come here first to see if there is any support.

If there is support, then it will appear in the new RandomGenerator, but
not the RandomState replacement. If not, then we can just delete it.

Kevin
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] overhauling numpy.random and randomgen Message-ID:

2019-04-19 Thread Kevin Sheppard
>  Rather than "base RNG", what about calling these classes a "random
source"
or "random stream"? In particular, I would suggest defining two Python
classes:
> - np.random.Generator as a less redundant name for what is currently
called
RandomGenerator
> - np.random.Source or np.random.Stream as an abstract base class for what
are currently called "base RNGs"

Naming is definitely hard.  Simple RNGs are currently called basic RNGs
which was inspired by mkl-random. 'source' sounds OK to me, but sort of
hides the fact that these are the actual Psuedo RNGs. `stream` has a
technical meaning (a single PRNG make produce multiple independent streams)
and IMO should be avoided since this might lead to confusion.  Perhaps
source_rng (or in docs Source RNG)?

RandomGenerator is actually RandomTransformer, but I didn't like the latter
name.

> There are also a couple of convenience attributes in the user-facing API
that I would suggest refining:
>   - The "brng" attribute of RandomGenerator is not a very descriptive
name. I
would prefer "stream" or "source", or the more explicit "base_rng" if we
stick with that term.
>   - I don't think we need the "generator" property on base RNG objects.
It is
fine to require writing np.random.Generator(base) instead. Looking at the
implementation, .generator caches the RandomGenerator objects it creates on
the base RNG, which creates a reference cycle. Yes, Python can garbage
collect reference cycles, but this is still a muddled data model.

The attribute name should match the final (descriptive) name, whatever it
is.  In RandomGen I am using the `basic_rng` attribute name, but this could
be `source`.  I also use a property so that the attribute can have a
docstring attached for use in IPython. I think this is more user-friendly.

I think dropping the `generator` property on the basic RNGs is reasonable.
It was a convenience but is awkward, and I always understood that it
creates a cycle.

> Finally, why do we expose the np.random.gen object? I thought part of the
idea with the new API was to avoid global mutable state.

Module level functions are essential for quick experiments and should be
provided.  The only difference here is that the singleton `seed`  and
`state` are no longer exposed so that it isn't possible (using the exposed
API) to set the seed.
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Creating a subclass that never propagates

2019-07-16 Thread Kevin Sheppard
I am trying to make a subclass that never propagates so that when
interacted with another ndarray, or even itself so that the return type is
always ndarray.  Is this possible?

I got pretty far with

def __array_wrap__(self, out_arr, context=None):
if out_arr.shape == ():
return out_arr.item()  # if ufunc output is scalar, return it
else:
out = super(ArrayLike, self).__array_wrap__(out_arr, context)
# Never return ArrayLike
if isinstance(out, ArrayLike):
out = out.view(np.ndarray)
return out

Which works well for ufuncs.  However, when I try other functions like
`dot` I get my subclass type returned.

If there a reasonable way to ensure that my subclass doesn't propagate? I
think I would need some way to override the behavior when .view(MySubClass)
is called.

Thanks,
Kevin
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Low-level API for Random

2019-09-19 Thread Kevin Sheppard
There are some users of the NumPy C code in randomkit.  This was never
officially supported.  There has been a long open issue to provide this
officially.

When I wrote randomgen I supplied .pdx files that make it simpler to write
Cython code that uses the components.  The lower-level API has not had much
scrutiny and is in need of a clean-up.   I thought this would also
encourage users to extend the random machinery themselves as part of their
project or code so as to minimize the requests for new (exotic)
distributions to be included in Generator.

Most of the generator functions follow a pattern random_DISTRIBUTION.  Some
have a bit more name mangling which can easily be cleaned up (like
ranomd_gauss_zig, which should become PREFIX_standard_normal).

Ralf Gommers suggested unprefixed names. I tried this in a local branch and
it was a bit ugly since some of the distributions have common math names
(e.g., gamma) and others are very short (e.g., t or f).  I think a prefix
is needed, and after looking through the C API docs npy_random_ seemed like
a reasonable choice (since these live in numpy.random).

Any thoughts on the following questions are welcome (others too):

1. Should there be a prefix on the C functions?
2. If so, what should the prefix be?
3. Should the legacy C functions be part of the API -- these are mostly the
ones that produce or depend on polar transform normals (Box-Muller). I have
a feeling no, but there may be reasons to prefer BM since they do not
depend on rejection sampling.
4. Should low-level API be consumable like any other numpy C API by
including the usual header locations and library locations?  Right now, the
pxd simplifies writing Cython but users have sp specify the location of the
headers and source manually  An alternative would be to provide a function
like np.get_include() -> np.random.get_include() that would specialize in
random.

Kevin
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Low-level API for Random

2019-09-19 Thread Kevin Sheppard
On Thu, Sep 19, 2019 at 10:23 AM Ralf Gommers 
wrote:

>
>
> On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard <
> kevin.k.shepp...@gmail.com> wrote:
>
>> There are some users of the NumPy C code in randomkit.  This was never
>> officially supported.  There has been a long open issue to provide this
>> officially.
>>
>> When I wrote randomgen I supplied .pdx files that make it simpler to
>> write Cython code that uses the components.  The lower-level API has not
>> had much scrutiny and is in need of a clean-up.   I thought this would also
>> encourage users to extend the random machinery themselves as part of their
>> project or code so as to minimize the requests for new (exotic)
>> distributions to be included in Generator.
>>
>> Most of the generator functions follow a pattern random_DISTRIBUTION.
>> Some have a bit more name mangling which can easily be cleaned up (like
>> ranomd_gauss_zig, which should become PREFIX_standard_normal).
>>
>> Ralf Gommers suggested unprefixed names.
>>
>
> I suggested that the names should match the Python API, which I think
> isn't quite the same. The Python API doesn't contain things like "gamma",
> "t" or "f".
>

My gamma and f (I misspoke about t) I mean the names that appear as
Generator methods:

https://docs.scipy.org/doc/numpy/reference/random/generator.html#numpy.random.Generator


If I understand your point (and with reference with page linked below),
then there would be something like numpy.random.cython_random.gamma (which
is currently called numpy.random.distributions.random_gamma). Maybe I'm not
understanding your point about the Python API though.


>
> I tried this in a local branch and it was a bit ugly since some of the
>> distributions have common math names (e.g., gamma) and others are very
>> short (e.g., t or f).  I think a prefix is needed, and after looking
>> through the C API docs npy_random_ seemed like a reasonable choice (since
>> these live in numpy.random).
>>
>> Any thoughts on the following questions are welcome (others too):
>>
>> 1. Should there be a prefix on the C functions?
>> 2. If so, what should the prefix be?
>>
>
> Before worrying about naming details, can we start with "what should be in
> the C/Cython API"? If I look through the current pxd files, there's a lot
> there that looks like it should be private, and what we expose as Python
> API is not all present as far as I can tell (which may be fine, if the only
> goal is to let people write new generators rather than use the existing
> ones from Cython without the Python overhead).
>

>From the ground up, for someone who want to write a new distribution:
1. The bit generators.  These currently have no pxd files. These are always
going to be Python obects and so it isn't absolutely essential to expose
them with a low-level API.  All that is needed is the capsule which has the
bitgen struct, which is what is really needed
2. bitgen_t which is in common.pxd.  This is essential since it enables
access to the callables to produce basic psueod random values.
3. The distributions, which are in distributions.pdx. The integer
generators are in bounded_integers.pxd.in, which would need to be processed
and then included after processing (same for bounded_integers.pxd.in).
a. The legacy in legacy_distributions.pxd.   If the legacy is included,
then aug_bitgen_t needs to also be included which is also in
legacy_distributions.pxd
4. The "helpers" which are defined in common.pxd.  These simplify
implementing complete distributions which support automatix broadcasting
when needed. They are only provided to match the signatures for the
functions in distributions.pxd. The highest level ones are cont() and
disc(). Some of the lower-level ones could easily be marked as private.

1,2 and 3 are pretty important.  4 could be in or out. It could help if
someone wanted to write a fully featured distribution w/ broadcasting, but
I think this use case is less likely than someone say wanting to implement
a custom rejection sampler.


For someone who wants to write a new BitGenerator

1. BitGenerator and SeedSequence in bit_generato.pxd are required. As is
bitgen_t which is in common. bitgen_t should probably move to
bit_generators.
2. aligned_malloc:  This has been requested on multiple occasions and is
practically important when interfacing with SSE or AVX code. It is
potentially more general than the random module. This lives in common.pxd.



>
> In the end we want to get to a doc section similar to
> http://scipy.github.io/devdocs/special.cython_special.html I'd think.
>
> 3. Should the legacy C functions be part of the API -- these are mostly
>> the ones that produce 

Re: [Numpy-discussion] Low-level API for Random

2019-09-25 Thread Kevin Sheppard
>
> I'd like to clarify what exactly we mean by exposing a C API.  Do we have
> in mind that our random number generators can be used from standalone C
> code, or via Cython `cimport` like with the current numpy.pxd?
>
> It sounds like we want to expose the highest level generators; do we also
> want to provide access to the bit streams?
>
>
What do you mean by standalone C?  A Python extension is written in C (but
not Cython)?  Or a C application that doesn't include Python.h?  The former
is pretty easy since you can use a few PyObjects to simplify initializing
the bit generator, and the rest of the code can be directly used in C
without any more Python objects.  The latter is also doable although the
low-level functions needed to initialize the bit generators (which are just
C structs) have no standardization.  I think the only component in a
standalone C application that would need some non-trivial work is
SeedSequence (i.e., more than changing function names or reorganizing
files).

Like Robert, I suspect that Cython users would be the largest immediate
beneficiaries of a lower-level API.

numba end-users can already consume the bit generators through the exposed
CFFI/ctypes interface they provide.  These can then be used with the
higher-leverl generators, although end users have to build a shared lib/DLL
first. Getting the C API in shape to be used directly by numba is probably
a bigger task.

Kevin
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is the build-system section of pyproject.toml a maintained list of the build dependencies?

2020-01-08 Thread Kevin Sheppard
With recent versions of pip it will read the pyproject.toml file to get the
dependencies, and then install these in an isolated environment to build
the wheel, and then install the wheel.  The requires=[...] in the pyproject
is not installed in the original environment, so that when run on NumPy,
you would only end up with NumPy installed. Are you trying to get Cython
installed after an install of NumPy?  If you want this then it needs to be
listed in the setup as a dependency.

On Wed, Jan 8, 2020 at 4:38 PM Warren Weckesser 
wrote:

> On 1/8/20, Warren Weckesser  wrote:
> > I'm doing some work on the travis-ci scripts, and I'd like to remove
> > some redundant calls of 'pip install'.  The scripts should get the
> > build dependencies from a configuration file instead of having
> > hard-coded pip commands.  Is pyproject.toml the appropriate file to
> > use for this?  (Note: we also have test_requirements.txt, but as the
> > name says, those are the dependencies for running the tests, not for
> > building numpy.)
> >
>
> Updating my question:  `pyproject.toml` lists numpy's build
> dependencies in the `build_system` section of the file:
>
> [build-system]
> # Minimum requirements for the build system to execute.
> requires = [
> "setuptools",
> "wheel",
> "Cython>=0.29.14",  # Note: keep in sync with tools/cythonize.py
> ]
>
> So the file serves the equivalent purpose of a `requirements.txt`
> file.  Is there an option to pip that would allow something like
>
> pip install -r pyproject.toml
>
> (with some other flag or option as needed) to install the build
> requirements found in pyproject.toml?  In
> https://github.com/numpy/numpy/pull/15275, I wrote a few lines of
> Python to get the dependencies from pyproject.toml, but it seems like
> that shouldn't be necessary.
>
> Warren
>
>
> > Warren
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] what to clean up, what to leave as is

2020-01-24 Thread Kevin Sheppard
I think some types of clean-ups, for example, imports, are pretty low cost,
low risk and don't have much bearing on and it might be best to do them all
at once.

f-strings are also pretty simple but can be abused to the detriment of code
around (for example, moving a string defined in a variable outside a
function into the function just so it can have an f-string).

I think more general code clean-up, for example, removing list around
iterators (e.g., list(map(f, i)) should be mostly avoided unless there is a
compelling case to prefer the iterator for performance reasons.

Kevin


On Fri, Jan 24, 2020 at 5:29 PM Charles R Harris 
wrote:

>
>
> On Fri, Jan 24, 2020 at 9:46 AM Ralf Gommers 
> wrote:
>
>> Hi all,
>>
>> It's great to see that people are jumping at the chance to clean up
>> Python 2 support. I would however caution about overdoing it on other
>> cleanups. As a reminder, we normally do not want pure style PRs (e.g. PEP8
>> cleanups), because they make the code history (git blame, commits on
>> particular files, etc.) harder to work with, have review overhead, and may
>> introduce new bugs for little gain.
>>
>> Imho that same rationale applies to things like converting strings to
>> f-strings. There's of course some gray area, for example removing "from ...
>> import *" can guard against accidentally exposing new API, so can be
>> considered a valuable cleanup.
>>
>> As a separate/additional point: numpy.distutils and numpy.f2py are
>> largely untested, PRs are hard to test locally because of platform-specific
>> code, and changes often introduce regressions. So even for some cleanups
>> that are okay for other files, please do not do them on those modules.
>>
>>
> I do like f-strings, they can make the code simpler and more readable.
>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] recent changes in np.maximum.accumulate ?

2020-02-18 Thread Kevin Sheppard
I have tried to locally reproduce this but cannot.  I do not have AVX-512 on my machine.  Fortunately we dump CPU info on travis and lo and behold it has AVX-512. *-cpu:0   description: CPU   product: Intel(R) Xeon(R) CPU   vendor: Intel Corp.   physical id: 1001   bus info: cpu@0   slot: CPU 1   size: 2GHz   capacity: 2GHz   width: 64 bits   capabilities: fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp x86-64 constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities Sounds like the AVX-512 path is bugged. Kevin  From: Sebastian BergSent: Tuesday, February 18, 2020 3:42 PMTo: numpy-discussion@python.orgSubject: Re: [Numpy-discussion] recent changes in np.maximum.accumulate ? On Tue, 2020-02-18 at 10:14 -0500, josef.p...@gmail.com wrote:> I'm trying to track down test failures of statsmodels against recent> master dev versions of numpy and scipy.> > The core computation is the following in one set of tests that fail> > pvals_corrected_raw = pvals * np.arange(ntests, 0, -1)> pvals_corrected = np.maximum.accumulate(pvals_corrected_raw)>  Hmmm, the two git hashes indicate few changes between the two versions(mainly unicode related). However, recently there was also the addition of AVX-512F loops tomaximum, so that seems like the most reasonable candidate (although Iam unsure it changed exactly between those versions, it is also morecomplex maybe due to needing a machine that supports the instructions). Some details about the input could be nice. But if this is all that isas input, it sounds like it should be a contiguous array? I guess itmight include subnormal numbers or NaN? Can you open an issue with some of those details if you have them? - Sebastian   > this numpy version > numpy-1.19.0.dev0%2B20200214184618_1f9ab28-cp38-cp38-> manylinux2010_x86_64.whl> is in the test run with failures (the first time statsmodel master> failed)> > the previous version in the test runs didn't have these failures>  numpy-1.19.0.dev0%2B20200212232857_af0dfce-cp38-cp38-> manylinux1_x86_64.whl> > > I'm right now just fishing for candidates for the failures. And I'm> not running any dev versions on my computer.> > Were there any recent changes that affect np.maximum.accumulate?> > Josef> > > > ___> NumPy-Discussion mailing list> NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Feelings about type aliases in NumPy

2020-04-24 Thread Kevin Sheppard
Typing is for library developers more than end users. I would also worry
that putting it into the top level might discourage other typing classes
since it is more difficult to add to the top level than to a lower level
module. np.typing seems very clear to me.

On Sat, Apr 25, 2020, 07:41 Stephan Hoyer  wrote:

>
>
> On Fri, Apr 24, 2020 at 11:31 AM Sebastian Berg <
> sebast...@sipsolutions.net> wrote:
>
>> On Fri, 2020-04-24 at 11:10 -0700, Stefan van der Walt wrote:
>> > On Fri, Apr 24, 2020, at 08:45, Joshua Wilson wrote:
>> > > But, Stephan pointed out that it might be confusing to users for
>> > > objects to only exist at typing time, so we came around to the
>> > > question of whether people are open to the idea of including the
>> > > type
>> > > aliases in NumPy itself. Ralf's concrete proposal was to make a
>> > > module
>> > > numpy.types (or maybe numpy.typing) to hold the aliases so that
>> > > they
>> > > don't pollute the top-level namespace. The module would initially
>> > > contain the types
>> >
>> > That sounds very sensible.  Having types available with NumPy should
>> > also encourage their use, especially if we can add some documentation
>> > around it.
>>
>> I agree, I might have a small tendency for `numpy.types` if we ever
>> find any usage other than direct typing that may be the better name?
>
>
> Unless we anticipate adding a long list of type aliases (more than the
> three suggested so far), I would lean towards adding ArrayLike to the top
> level NumPy namespace as np.ArrayLike.
>
> Type annotations are becoming an increasingly core part of modern Python
> code. We should make it easy to appropriately type check functions that act
> on NumPy arrays, and a top level np.ArrayLike is definitely more convenient
> than np.types.ArrayLike.
>
> Out of curiousity, I guess `ArrayLike` would be an ABC that a
>> downstream project can register with?
>
>
> ArrayLike will be a typing Protocol, automatically recognizing attributes
> like __array__ to indicate that something can be cast to an array.
>
>
>>
>> - Sebastian
>>
>>
>> >
>> > Stéfan
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Armv8 server donation

2020-06-11 Thread Kevin Sheppard
Someone could setup  https://drone.io/   which has both aarch64 and armv8l
servers. I have used this service without issue in randomgen which requires
building NumPy (but not testing it).

On Thu, Jun 11, 2020 at 2:21 PM Stefan van der Walt 
wrote:

> On Thu, Jun 11, 2020, at 13:47, ChunLin Fang wrote:
>
> I noticed that the shippable CI always skipped after PR submitted ,
> The reason why it's skip seems to be "No active nodes found in shared node
> pool "shippable_shared_aarch64""
> Potential bugs may buried through out numpy without shippable CI.
> I happened to own an idle armv8 server that can donate to numpy
> community, please let me know if that can improve numpy's CI/CD environment
> , also need somebody help me set up the CI/CD environment on that server.
>
>
> Thank you for your kind offer!
>
> Someone else would be more qualified than I to comment on why the pool is
> empty, and what resources we have available.
>
> I know that AppVeyor allows you to host your own nodes, and it looks like
> Azure does too:
>
>
> https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/v2-linux?view=azure-devops
> https://www.appveyor.com/docs/server/#linux
>
> Stéfan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: randomgen 1.19.0 release

2020-06-24 Thread Kevin Sheppard
randomgen 1.19.0 has been released with bug fixes and new features.

*New Features*

   - A helper class that lets users define custom bit generators in Python
   (slow) or numba (fast). This simplifies experimenting with alternative
   configurations. The UserBitGenerator can be used with
   numpy.random.Generator to produce variants from a wide range of
   distributions.
   https://bashtage.github.io/randomgen/custom-bit-generators.html
   - New bit generators:
  - PCG64DXSM - The cheap-multiplier variant that produces output
  before updating the state. This generator may become the default in NumPy
  in the future and is the 2.0 version of PCG64.
  - LCG128Mix - A 128-bit LCG with a settable multiplier, increment,
  and output function. This generator nests both PCG64 and PCG64DXSM as
  special cases. It also can act as a standard 128bit LCG/MCG.
  - EFIIX64 (x48 variant) - Chris Doty's stream cipher-like generator
  - SFC64 with settable counter increment which allows distinct streams
  to be produced from the same SeedSequence.
  - LXM - A generator that mixes the output of a 64 bit-LCG and a
  256bit Xorshift.
   - ExtendedGenerator contains methods for producing variants that are not
   included in numpy.random.Generator.

*Deprecations*
Both Generator and RandomState have been officially deprecated. The NumPy
versions are both better maintained and feature-rich.

*Other*
All bit generators have been tested output at least 1TB (many to 4TB) both
as single streams, interleaved and interleaved with 4 or 8196 other bit
generators. When interleaved, the additional generators were constructed
using both SeedSequence.spawn and jumped (when available).

https://bashtage.github.io/randomgen/testing.html
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reseed random generator (1.19)

2020-06-24 Thread Kevin Sheppard
Just call

rs = default_rng()

Again.



On Wed, Jun 24, 2020, 20:31 Neal Becker  wrote:

> Consider the following:
>
> from numpy.random import default_rng
> rs = default_rng()
>
> Now how do I re-seed the generator?
> I thought perhaps rs.bit_generator.seed(), but there is no such attribute.
>
> Thanks,
> Neal
>
> --
> *Those who don't understand recursion are doomed to repeat it*
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reseed random generator (1.19)

2020-06-29 Thread Kevin Sheppard
If you want to use the same entropy-initialized generator for temporarily-reproducible experiments, then you can use gen = np.random.default_rng()state = gen.bit_generator.stategen.standard_normal()# 0.5644742559549797, will vary across runsgen.bit_generator.state = stategen.standard_normal()# Always the same as before 0.5644742559549797 The equivalent to the old way of calling seed to reseed is: SEED = 918273645gen = np.random.default_rng(SEED)gen.standard_normal()# 0.12345677gen = np.random.default_rng(SEED)gen.standard_normal()# Identical value Rather than reseeding the same object, you just create a new object. At some point in the development of Generator both methods were timed and there was no performance to reusing the same object by reseeding. Kevin   From: Neal BeckerSent: Monday, June 29, 2020 1:01 PMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] reseed random generator (1.19) I was using this to reset the generator, in order to repeat the same sequence again for testing purposes. On Wed, Jun 24, 2020 at 6:40 PM Robert Kern  wrote:On Wed, Jun 24, 2020 at 3:31 PM Neal Becker  wrote:Consider the following: from numpy.random import default_rngrs = default_rng() Now how do I re-seed the generator?I thought perhaps rs.bit_generator.seed(), but there is no such attribute. In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances? The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number. -- Robert Kern___NumPy-Discussion mailing listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion -- Those who don't understand recursion are doomed to repeat it 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reseed random generator (1.19)

2020-06-29 Thread Kevin Sheppard
The best practice is to use a SeedSequence to spawn child SeedSequences, and then to use these children to initialize your generators or bit generators.  from numpy.random import SeedSequence, Generator, PCG64, default_rng entropy = 382193877439745928479635728 seed_seq = SeedSequence(entropy)NUM_STREAMS = 2**15children = seed_seq.spawn(NUM_STREAMS)# if you want the current best bit generator, which may change rngs = [default_rng(child) for child in children]# If you want the most control across version, set the bit generator# this uses PCG64, which is the current default. Each bit generator needs to be wrapped in a generatorrngs = [Generator(PCG64(child)) for child in children] Kevin  From: Evgeni BurovskiSent: Monday, June 29, 2020 2:21 PMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] reseed random generator (1.19) (apologies for jumping into a conversation)So what is the recommendation for instantiating a number of generatorswith manually controlled seeds? The use case is running a series of MC simulations with reproduciblestreams. The runs are independent and are run in parallel in separateOS processes, where I do not control the time each process starts(jobs are submitted to the batch queue), so default seeding seemsdubious? Previously, I would just do roughly seeds = [1234, 1235, 1236, ...]rngs = [np.random.RandomState(seed) for seed in seeds]...and each process operates with its own `rng`.What would be the recommended way with the new `Generator` framework?A human-friendly way would be preferable if possible. Thanks, Evgeni  On Mon, Jun 29, 2020 at 3:20 PM Kevin Sheppard wrote:> > If you want to use the same entropy-initialized generator for temporarily-reproducible experiments, then you can use> > > > gen = np.random.default_rng()> > state = gen.bit_generator.state> > gen.standard_normal()> > # 0.5644742559549797, will vary across runs> > gen.bit_generator.state = state> > gen.standard_normal()> > # Always the same as before 0.5644742559549797> > > > The equivalent to the old way of calling seed to reseed is:> > > > SEED = 918273645> > gen = np.random.default_rng(SEED)> > gen.standard_normal()> > # 0.12345677> > gen = np.random.default_rng(SEED)> > gen.standard_normal()> > # Identical value> > > > Rather than reseeding the same object, you just create a new object. At some point in the development of Generator both methods were timed and there was no performance to reusing the same object by reseeding.> > > > Kevin> > > > > > > > From: Neal Becker> Sent: Monday, June 29, 2020 1:01 PM> To: Discussion of Numerical Python> Subject: Re: [Numpy-discussion] reseed random generator (1.19)> > > > I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.> > > > On Wed, Jun 24, 2020 at 6:40 PM Robert Kern  wrote:> > On Wed, Jun 24, 2020 at 3:31 PM Neal Becker  wrote:> > Consider the following:> > > > from numpy.random import default_rng> rs = default_rng()> > > > Now how do I re-seed the generator?> > I thought perhaps rs.bit_generator.seed(), but there is no such attribute.> > > > In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances?> > > > The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number.> > > > --> > Robert Kern> > ___> NumPy-Discussion mailing list> NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion> > > > > --> > Those who don't understand recursion are doomed to repeat it> > > > ___> NumPy-Discussion mailing list> NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion___NumPy-Discussion mailing listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reseed random generator (1.19)

2020-06-29 Thread Kevin Sheppard
It can be anything, but “good practice” is to use a number that would have 2 properties: When expressed as binary number, it would have a large number of both 0s and 1sThe total number of digits in the binary representation is somewhere between 32 and 128. The binary representation of the one I chose (by mashing numbers) is: '0b1001001001011001110011100100011011100110000011000101000010001' This has 43 0s and 46 1s. Many people just use 0, which is fine in the sense that the stream should have the same properties as if any of 2**N number were chosen. Simple choices so, however, have a slight consequence in the sense that these generate strange dependence across researchers if everyone uses a small number of seeds (e.g., 0 or 1234). Kevin  From: Evgeni BurovskiSent: Monday, June 29, 2020 4:01 PMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] reseed random generator (1.19) Thanks Kevin! A possibly dumb follow-up question: in your example, > entropy = 382193877439745928479635728 is it relevant that `entropy` is a long integer? I.e., what are theconstraints on its value, can one use entropy = 1234 orentropy = 0 orentropy = 1 instead?On Mon, Jun 29, 2020 at 5:37 PM Kevin Sheppard wrote:> > The best practice is to use a SeedSequence to spawn child SeedSequences, and then to use these children to initialize your generators or bit generators.> > > > > > from numpy.random import SeedSequence, Generator, PCG64, default_rng> > > > entropy = 382193877439745928479635728> > > > seed_seq = SeedSequence(entropy)> > NUM_STREAMS = 2**15> > children = seed_seq.spawn(NUM_STREAMS)> > # if you want the current best bit generator, which may change> > rngs = [default_rng(child) for child in children]> > # If you want the most control across version, set the bit generator> > # this uses PCG64, which is the current default. Each bit generator needs to be wrapped in a generator> > rngs = [Generator(PCG64(child)) for child in children]> > > > Kevin> > > > > > From: Evgeni Burovski> Sent: Monday, June 29, 2020 2:21 PM> To: Discussion of Numerical Python> Subject: Re: [Numpy-discussion] reseed random generator (1.19)> > > > (apologies for jumping into a conversation)> > So what is the recommendation for instantiating a number of generators> > with manually controlled seeds?> > > > The use case is running a series of MC simulations with reproducible> > streams. The runs are independent and are run in parallel in separate> > OS processes, where I do not control the time each process starts> > (jobs are submitted to the batch queue), so default seeding seems> > dubious?> > > > Previously, I would just do roughly> > > > seeds = [1234, 1235, 1236, ...]> > rngs = [np.random.RandomState(seed) for seed in seeds]> > ...> > and each process operates with its own `rng`.> > What would be the recommended way with the new `Generator` framework?> > A human-friendly way would be preferable if possible.> > > > Thanks,> > > > Evgeni> > > > > > On Mon, Jun 29, 2020 at 3:20 PM Kevin Sheppard> >  wrote:> > >> > > If you want to use the same entropy-initialized generator for temporarily-reproducible experiments, then you can use> > >> > >> > >> > > gen = np.random.default_rng()> > >> > > state = gen.bit_generator.state> > >> > > gen.standard_normal()> > >> > > # 0.5644742559549797, will vary across runs> > >> > > gen.bit_generator.state = state> > >> > > gen.standard_normal()> > >> > > # Always the same as before 0.5644742559549797> > >> > >> > >> > > The equivalent to the old way of calling seed to reseed is:> > >> > >> > >> > > SEED = 918273645> > >> > > gen = np.random.default_rng(SEED)> > >> > > gen.standard_normal()> > >> > > # 0.12345677> > >> > > gen = np.random.default_rng(SEED)> > >> > > gen.standard_normal()> > >> > > # Identical value> > >> > >> > >> > > Rather than reseeding the same object, you just create a new object. At some point in the development of Generator both methods were timed and there was no performance to reusing the same object by reseeding.> > >> > >> > >> > > Kevin> > >> > >> > >> > >> > >> > >> > >> > > From: Neal Becker> > > Sent: Monday, June 29, 2020 1:01 PM> > > To: Discussion 

Re: [Numpy-discussion] NumPy Documentation Gallery structure

2020-07-08 Thread Kevin Sheppard
statsmodels uses a homegrown solution with some templating to make a gallery of jupyter notebooks. https://github.com/statsmodels/statsmodels/blob/master/docs/source/examples/index.rst   From: Saber ToothSent: Wednesday, July 8, 2020 8:06 AMTo: Matti PicusCc: numpy-discussion@python.org; numpy-scipy-g...@googlegroups.comSubject: Re: [Numpy-discussion] NumPy Documentation Gallery structure Hi Matti , Yup , Matplotlib and SunPy are using Sphinx Gallery 🙂 , I am asking , could we are also have some discussion around such structure maybe modifying it in a way more suited to NumPy community ? What do you think? Thanks ,Mrinal  On Wed, 8 Jul, 2020, 12:09 pm Matti Picus,  wrote:I think those projects use https://github.com/sphinx-gallery/sphinx-gallery to do the layout Matti On 7/8/20 9:31 AM, Saber Tooth wrote:Hello Melissa and Ralf ,  I was just wondering if we haven't had much discussion on the structuring of tutorials , while we have had some discussion on Explanations . I have been analysing the Tutorials of some other communities too , say Matplotlib :they have basically structured the tutorials page like a Gallery where examples have been arranged in the order of experience level .Link to Matplotlib tutorials : https://matplotlib.org/3.2.2/tutorials/index.html Here is SunPy Documentation. They have tried to structure their How-TO examples in the same way like a Gallery .Link to SunPy Gallery Examples :https://docs.sunpy.org/en/v2.0.1/generated/gallery/index.html# Please have a Look , maybe we can also get some new ideas for our Documentation .I think it will help us to structure the Tutorials page of NumPy too . Thanks ,Mrinal Tyagi -- You received this message because you are subscribed to the Google Groups "numpy-scipy-gsod" group.To unsubscribe from this group and stop receiving emails from it, send an email to numpy-scipy-gsod+unsubscr...@googlegroups.com.To view this discussion on the web visit https://groups.google.com/d/msgid/numpy-scipy-gsod/CAAq%2BcUHVfs-u%2B2cAwXZjwvnd6VOPBvmM1BMxJtPmfa6oHZz0nA%40mail.gmail.com. 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy dtype API improvement suggestion

2020-07-26 Thread Kevin Sheppard
Better would be to have an object like NamedTuple in typing that would allow

class Point(DType):
x: np.int16
y: np.int16



On Sun, Jul 26, 2020 at 3:22 PM Eyal Kutz  wrote:

> I am interested in suggesting an API improvement for NumPy.
> I wish to make it so that the following code:
> @np.dtype
> class Point:
> x: np.int16
> y: np.int16
> would be equivalent to the following code:
> Point = np.dtype([('x', np.int16), ('y', np.int16)])
>
> I am willing to submit the code changes required to make this happen.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Replacement for Rackspace

2020-08-07 Thread Kevin Sheppard
The Rackspace hosted wheel endpoints at  https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com/ and https://3f23b170c54c2533c070-1c8a9b3114517dc5fe17b7c3f8c63a43.ssl.cf2.rackcdn.com/ seem to not be working.  I know NumPy, SciPy, pandas and scikit-learn are all using a common end point on anacona.org.  Statsmodels is preparing for  release, and the wheel builder at https://github.com/MacPython/statsmodels-wheels is failing at upload.  Is there any shared resource for uploading nightlies and release wheels?  Or should we just use a separate account on anaconda.org? Thanks,Kevin  
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] new function: broadcast_shapes

2020-10-16 Thread Kevin Sheppard
There is at least one custom implementation in `Generator` IIRC, so a
formal API addition sounds like a good idea.

Kevin


On Thu, Oct 15, 2020 at 8:52 PM Stephan Hoyer  wrote:

> On Thu, Oct 15, 2020 at 11:46 AM Warren Weckesser <
> warren.weckes...@gmail.com> wrote:
>
>> On 10/15/20, Madhulika Jain Chambers  wrote:
>> > Hello all,
>> >
>> > I opened a PR to add a function which returns the broadcasted shape
>> from a
>> > given set of shapes:
>> > https://github.com/numpy/numpy/pull/17535
>> >
>> > As this is a proposed change to the API, I wanted to see if there was
>> any
>> > feedback from the list.
>>
>>
>> Thanks, this is useful!  I've implemented something similar many times
>> over the years, and could have used it in some SciPy code, where we
>> currently have a private implementation in one of the `stats` modules.
>>
>> Warren
>
>
> +1 for adding this.
>
> There's a version of this helper function -- coincidentally with the
> exactly same API and name -- in JAX, too:
>
> https://github.com/google/jax/blob/22c3684d3bac3ad0f40aca69cdbc76842fa9dfc0/jax/lax/lax.py#L73
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: split numpy.distutils into it's own package

2020-10-24 Thread Kevin Sheppard
Separating it is definitely a good idea.  The only thing that I think would
be better would be if they key features that are not in setuptools could be
added there so that NumPy distutils could eventually be retired.

Kevin

On Sat, Oct 24, 2020, 08:12 Dustin Spicuzza 
wrote:

> Hey all,
>
> [copied from https://github.com/numpy/numpy/issues/17620, as requested
> by the feature request guidelines]
>
> Cross-compiling scipy and other projects that depend on numpy's
> distutils is a huge pain right now, because to do it [in addition to
> lots of other details that you have to get right] you have to have both
> a native and cross-compiled version of numpy installed. It seems pretty
> unreasonable that I need a native version of numpy installed to compile
> scipy. One might ask, why is this needed?
>
> Well, scipy's setup.py uses numpy's distutils fork for the fortran
> support (I think). Because numpy.distutils is a subpackage of numpy, if
> you're working with a cross-compiled version of numpy, it eventually
> tries to import some .so that wasn't compiled for your host and
> everything dies -- thus you have to have a native numpy installed to
> allow numpy.distutils to import correctly.
>
> As far as I can tell, numpy's distutils fork is pure python and doesn't
> actually use anything else in numpy itself, and so is completely
> separable from numpy. If it were its own top-level package that scipy et
> al could use, then cross-compiling would be significantly less tricky.
>
> The mechanics of this would work like so:
>
> * contents of numpy.distutils would be moved to 'numpy_distutils' package
> * importing numpy.distutils would raise a deprecation warning and
> redirect to numpy_distutils, probably using import hook magic
> * scipy and other packages would now utilize numpy_distutils instead of
> numpy.distutils at build time
>
> Initially, I considered proposing creating a separate pypi package for
> numpy_distutils, but I expect that would be more trouble than it's
> worth. One could also propose creating a new PEP 517 build backend, or
> move to cmake, or other huge changes, but those also seem more trouble
> than they're worth.
>
> Thanks for your consideration.
>
> Dustin
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: split numpy.distutils into it's own package

2020-10-25 Thread Kevin Sheppard
NumPy could take an explicit runtime dependency on numpy-distutils so that
the code would be technically in a different repo bit would always be
available through NumPy. Eventually this could be removed as a runtime
dependency.

Kevin


On Sun, Oct 25, 2020, 09:23 Matti Picus  wrote:

>
> On 10/25/20 10:46 AM, Dustin Spicuzza wrote:
> > I took a first stab at it, and... surprise, surprise, there were a few
> > more warts than I had originally expected in my initial survey. The
> > biggest unexpected result is that numpy.f2py would need to also be a
> > toplevel package. I did get the refactor cross-compiled and started on
> > scipy, but there's a few more issues that will have to be resolved on
> > the scipy side.
> >
> > I posted a detailed set of notes on the issue (#17620) and made a draft
> > PR with my initial results (#17632) if you want to get a sense for how
> > invasive this is (or isn't depending on your point of view).
> >
> > Dustin
> >
> Is there a way to do this without modifying SciPy? That would reassure
> me that this change will not break other peoples' workflow. It is hard
> to believe that only SciPy uses numpy.distutils. If the changes break
> backward compatibility, they need to be done like any other deprecation:
> warn for 4 releases (two years) before actually breaking workflows.
>
>
> Matti
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should we transfer MacPython/numpy-wheels to the NumPy org?

2020-10-29 Thread Kevin Sheppard
Hi Matti, Could you migrate the statsmodels-wheels repo? ThanksKevin  From: Matti PicusSent: Thursday, October 29, 2020 4:19 PMTo: numpy-discussion@python.orgSubject: Re: [Numpy-discussion] Should we transfer MacPython/numpy-wheels to the NumPy org?  On 10/29/20 4:42 PM, Charles R Harris wrote: My thought was administration, which was triggered by the need to move to travis-ci.com. Note that the travis app shows up on the MacPython site, and apparently is available for scipy-wheels, but is not visible to numpy-wheels. My guess is that an owner of MacPython needs to add the repo and migrate it. Chuck Done. https://travis-ci.com/github/MacPython/numpy-wheels/buildsI also migrated openblas-libs.If any of the other MacPython/* repos wish to migrate let me know.Matti 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Efficient way to draw multinomial distribution random samples

2020-11-01 Thread Kevin Sheppard
This is in the pending PR.  Hopefully out in 1.20.

Kevin


On Sun, Nov 1, 2020, 23:55 Currurant  wrote:

> I realized that neither numpy.random.multinomial nor rng.multinomial has
> the
> ability to draw from different multinomial distributions at the same time
> like what MATLAB mnrnd() does here:
>
> https://www.mathworks.com/help/stats/mnrnd.html
>
> Also, I have asked this question on StackOverFlow:
>
>
> https://stackoverflow.com/questions/64529620/is-there-an-efficient-way-to-generate-multinomial-random-variables-in-parallel?noredirect=1#comment114131565_64529620
>
> It seems like this is something good to add to numpy.random, since it would
> be much more faster when you have many multinomial distributions to draw
> from---using loops.
>
>
>
> --
> Sent from: http://numpy-discussion.10968.n7.nabble.com/
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy 1.20.x branch in two weeks

2020-11-01 Thread Kevin Sheppard
I also feel the NEP is very useful since it helps downstream to drop older versions relatively early.  I’ve found this very useful in projects that have less maintenance bandwidth. Kevin  From: Jarrod MillmanSent: Monday, November 2, 2020 2:56 AMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] NumPy 1.20.x branch in two weeks I also misunderstood the purpose of the NEP.  I assumed it wasintended to encourage projects to drop old versions of Python.  Otherpeople have viewed the NEP similarly:https://github.com/networkx/networkx/issues/4027 If the intention of the NEP is to specify that projects not drop oldversion of Python too early, I don't think it is obvious from the NEP.It would be helpful if you added a simple motivation statement nearthe top of the document.  Something like: ## Motivation and Scope The purpose of the NEP is to ensure projects in the scientific Pythonecosystem don't drop support for old version of Python and NumPy toosoon. On Sun, Nov 1, 2020 at 6:44 PM Jarrod Millman  wrote:> > NetworkX is currently planning to support 3.6 for our coming 2.6> release (dec 2020) and 3.0 release (early 2021).  We had originally> thought about following NEP 29.  But I assumed it had been abandoned,> since neither NumPy nor SciPy dropped Python 3.6 on Jun 23, 2020.> > NetworkX is likely to continue supporting whatever versions of Python> both NumPy and SciPy support regardless of what NEP 29 says.  I> wouldn't be surprised if other projects do the same thing.> > Jarrod___NumPy-Discussion mailing listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How did Numpy get its latest version of the documentation to appear at the top of Google search results?

2020-11-13 Thread Kevin Sheppard
I’m not sure if NumPy got these in place, but you could add a canonical link to each page that has  If your pages.  Would require some sphinx mods to your theme.  This would tell all search engines that they should refer to the latest as the authoritative page for competing pages. See https://yoast.com/rel-canonical/ Kevin   From: efremdan1Sent: Friday, November 13, 2020 4:36 PMTo: numpy-discussion@python.orgSubject: [Numpy-discussion] How did Numpy get its latest version of the documentation to appear at the top of Google search results? I'm working with Bokeh (https://docs.bokeh.org/en/latest/), anotheropen-source Python package. The developers would like to have the latestversion of their documentation appear at the top of Google search resultswhen users search for information, but knowledge of how to do this islacking. I've noticed that Numpy seems to have gotten this problem figured out, e.g.,googling "numpy interpolate" results in the first hit beinghttps://numpy.org/doc/stable/reference/generated/numpy.interp.html. This isunlike Python itself, where googling "python string formatting" results inthe first hit being https://docs.python.org/3.4/library/string.html. So apparently someone in the Numpy developer world knows how to setup thedoc pages in a manner that allows for this. Would that person be willing topost to the Bokeh message board on the topic(https://discourse.bokeh.org/t/some-unsolicited-feedback/6643/17) with someadvice? Thank you!   --Sent from: http://numpy-discussion.10968.n7.nabble.com/___NumPy-Discussion mailing listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] problem with numpy 1.19.4 install via pip on Win 10

2020-12-03 Thread Kevin Sheppard
I believe the next releases (1.19.x or 1.20.0, whichever if first) of NumPy should have 0.3.12 built using a smaller buffersize which will make this import error stop appearing on Windows, irrespective of the MS-provided fix.  From: Oscar BenjaminSent: Friday, December 4, 2020 12:05 AMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] problem with numpy 1.19.4 install via pip on Win 10 The final message in the numpy issue suggests a possible fix:https://github.com/numpy/numpy/issues/16744#issuecomment-727098973"""My preference would be to have a 1.19.5 that uses OpenBLAS 0.3.9 onLinux and 0.3.12 on Windows, but I don't know if this is possible.""" I don't know if that would fix the problem or how hard that would beto implement but I am seeing an increasing flow of problem reports indifferent fora about this issue and I'm surprised that the plan is towait for MS to (possibly) release a fix in two month's time. -- Oscar On Thu, 3 Dec 2020 at 21:14, Alan G. Isaac  wrote:> > "current expectation is that this [fix] will be able to be released near the end of January 2021"> !> > > On 12/2/2020 7:13 PM, Ilhan Polat wrote:> > Yes this is known and we are waiting MS to roll out a solution for this. Here are more details> > https://developercommunity2.visualstudio.com/t/fmod-after-an-update-to-windows-2004-is-causing-a/1207405> > > ___> NumPy-Discussion mailing list> NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion___NumPy-Discussion mailing listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?

2020-12-14 Thread Kevin Sheppard
You need to reacquire the gil, then you can get the lock and rerelease the
gil.

I think this works (on phone, so untested)

with gil:
with nogil, lock:
...

Kevin


On Mon, Dec 14, 2020, 13:37 Evgeni Burovski 
wrote:

> Hi,
>
> What would be the correct way of locking the bit generator of
> np.random.Generator in cython's nogil context?
> (This is threading 101, surely, so please forgive my ignorance).
>
> The docs for extending np.random.Generator in cython
> (https://numpy.org/doc/stable/reference/random/extending.html#cython)
> recommend the following idiom for generating uniform variates, where
> the GIL is released and a Generator-specific lock is held:
>
> x = PCG64()
> rng =  PyCapsule_GetPointer(x.capsule, capsule_name)
> with nogil, x.lock:
> rng.next_double(rng.state)
>
> What is the correct way of locking it when already in the nogil
> section (so that x.lock is not accessible)?
>
> The use case is a long-running MC process which generates random
> variates in a tight loop (so the loop is all nogil). In case it
> matters, I probably won't be using python threads, but may use
> multiprocessing.
>
> Basically,
>
> cdef double uniform(self) nogil:
> if self.idx >= self.buf.shape[0]:
> self._fill()
> cdef double value = self.buf[self.idx]
> self.idx += 1
> return value
>
> cdef void _fill(self) nogil:
> self.idx = 0
> # HERE: Lock ?
> for i in range(self.buf.shape[0]):
> self.buf[i] = self.rng.next_double(self.rng.state)
>
>
> Thanks,
> Evgeni
>
>
> P.S. The full cdef class, for completeness:
>
> cdef class RndmWrapper():
> cdef:
> double[::1] buf
> Py_ssize_t idx
> bitgen_t *rng
> object py_gen  # keep the garbage collector away
>
>def __init__(self, seed=(1234, 0), buf_size=4096, bitgen_kind=None):
> if bitgen_kind is None:
> bitgen_kind = PCG64
>
> # cf Numpy-discussion list, K.~Sheppard, R.~Kern, June 29,
> 2020 and below
> #
> https://mail.python.org/pipermail/numpy-discussion/2020-June/080794.html
> entropy, num = seed
> seed_seq = SeedSequence(entropy, spawn_key=(num,))
> py_gen = bitgen_kind(seed_seq)
>
> # store the python object to avoid it being garbage collected
> self.py_gen = py_gen
>
> capsule = py_gen.capsule
> self.rng = PyCapsule_GetPointer(capsule, capsule_name)
> if not PyCapsule_IsValid(capsule, capsule_name):
> raise ValueError("Invalid pointer to anon_func_state")
>
> self.buf = np.empty(buf_size, dtype='float64')
> self._fill()
>
> @cython.boundscheck(False)
> @cython.wraparound(False)
> cdef void _fill(self) nogil:
> self.idx = 0
> for i in range(self.buf.shape[0]):
> self.buf[i] = self.rng.next_double(self.rng.state)
>
> @cython.boundscheck(False)
> @cython.wraparound(False)
> cdef double uniform(self) nogil:
> if self.idx >= self.buf.shape[0]:
> self._fill()
> cdef double value = self.buf[self.idx]
> self.idx += 1
> return value
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Installing numpy

2020-12-14 Thread Kevin Sheppard
NumPy does not support Python 2.7. If you must use 2.7 you need to use NumPy 1.16.6.  Otherwise, you should probably be using Python 3.8 for new work. Kevin  From: Lianyuan ZhengSent: Monday, December 14, 2020 2:15 PMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] Installing numpy Hello, When I run the command "python numpy install", it shows the following error: Traceback (most recent call last):  File "setup.py", line 27, in     import versioneer  File "./numpy/versioneer.py", line 1739    file=sys.stderr)        ^SyntaxError: invalid syntax Is this caused by the python version installed too old (v2.7.17) or other? Thanks in advance!Lianyuan On Fri, Dec 11, 2020 at 2:49 PM Stanley Seibert  wrote:The development version of NumPy from Github requires Python 3.7 or later. On Fri, Dec 11, 2020 at 1:35 PM Lianyuan Zheng  wrote:Hello, On my linux server, I downloaded the NUMPY package from GitHub (git clone https://github.com/numpy/numpy.git) and then accessed the directory "numpy".  When I typed command "python setup.py install", it shows the following message:   File "setup.py", line 51    f"NumPy {VERSION} may not yet support Python "                                                 ^SyntaxError: invalid syntax Obviously the installation failed.  Which python version is needed for this numpy package?  The python version installed is version 2.7.5. Thanks,Lianyuan   ___NumPy-Discussion mailing listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion___NumPy-Discussion mailing listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?

2020-12-14 Thread Kevin Sheppard
I don’t think it that strange.  You always need the GIL before you interact with Python code. The lock is a python object, and so you need the GIL.  You could redesign your code to always use different bit generators so that you would never use the same one, in which case you wouldn’t need to worry about the lock. I also think that the lock only matters for Multithreaded code not Multiprocess.  I believe the latter pickles and unpickles any Generator object (and the underying BitGenerator) and so each process has its own version.  Note that when multiprocessing the recommended procedure is to use spawn() to generate a sequence of BitGenerators and to use a distinct BitGenerator in each process. If you do this then you are free from the lock. Kevin  From: Evgeni BurovskiSent: Monday, December 14, 2020 2:10 PMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context? On Mon, Dec 14, 2020 at 4:46 PM Kevin Sheppard wrote:> > You need to reacquire the gil, then you can get the lock and rerelease the gil.> > I think this works (on phone, so untested)> > with gil:> with nogil, lock:> .. Thanks Kevin.This surely works, but feels seriously weird. Is this the only / therecommended way?I can of course adjust the buffer size to amortize the time for theGIL manipulation, but this really starts looking like a code smell.My original motivation was to have inner simulation loops python-free.Most likely, the issue is that I'm not using the Generator correctly? Evgeni  > On Mon, Dec 14, 2020, 13:37 Evgeni Burovski  wrote:>> >> Hi,>> >> What would be the correct way of locking the bit generator of>> np.random.Generator in cython's nogil context?>> (This is threading 101, surely, so please forgive my ignorance).>> >> The docs for extending np.random.Generator in cython>> (https://numpy.org/doc/stable/reference/random/extending.html#cython)>> recommend the following idiom for generating uniform variates, where>> the GIL is released and a Generator-specific lock is held:>> >> x = PCG64()>> rng =  PyCapsule_GetPointer(x.capsule, capsule_name)>> with nogil, x.lock:>> rng.next_double(rng.state)>> >> What is the correct way of locking it when already in the nogil>> section (so that x.lock is not accessible)?>> >> The use case is a long-running MC process which generates random>> variates in a tight loop (so the loop is all nogil). In case it>> matters, I probably won't be using python threads, but may use>> multiprocessing.>> >> Basically,>> >> cdef double uniform(self) nogil:>> if self.idx >= self.buf.shape[0]:>> self._fill()>> cdef double value = self.buf[self.idx]>> self.idx += 1>> return value>> >> cdef void _fill(self) nogil:>>     self.idx = 0>> # HERE: Lock ?>> for i in range(self.buf.shape[0]):>> self.buf[i] = self.rng.next_double(self.rng.state)>> >> >> Thanks,>> Evgeni>> >> >> P.S. The full cdef class, for completeness:>> >> cdef class RndmWrapper():>> cdef:>> double[::1] buf>> Py_ssize_t idx>> bitgen_t *rng>> object py_gen  # keep the garbage collector away>> >>    def __init__(self, seed=(1234, 0), buf_size=4096, bitgen_kind=None):>> if bitgen_kind is None:>> bitgen_kind = PCG64>> >> # cf Numpy-discussion list, K.~Sheppard, R.~Kern, June 29,>> 2020 and below>> # https://mail.python.org/pipermail/numpy-discussion/2020-June/080794.html>> entropy, num = seed>> seed_seq = SeedSequence(entropy, spawn_key=(num,))>> py_gen = bitgen_kind(seed_seq)>> >> # store the python object to avoid it being garbage collected>> self.py_gen = py_gen>> >> capsule = py_gen.capsule>> self.rng = PyCapsule_GetPointer(capsule, capsule_name)>> if not PyCapsule_IsValid(capsule, capsule_name):>> raise ValueError("Invalid pointer to anon_func_state")>> >> self.buf = np.empty(buf_size, dtype='float64')>> self._fill()>> >> @cython.boundscheck(False)>> @cython.wraparound(False)>>     cdef void _fill(self) nogil:>> self.idx = 0>> for i in range(self.buf.shape[0]):>> self.buf[i] = self.rng.next_double(self.rng.state)>> >> @cython.boundscheck(False)>> @cython.wraparound(False)>> cdef double uniform(self) nogil:>> 

Re: [Numpy-discussion] Deprecate np.testing.dec

2020-12-23 Thread Kevin Sheppard
Have you performed a search on GitHub to look for use of the decorators?  I think external use is more of a concern than internal. Kevin  From: rpolleySent: Wednesday, December 23, 2020 6:05 PMTo: numpy-discussion@python.orgSubject: [Numpy-discussion] Deprecate np.testing.dec Saw some discussion in this issue that since the decorators in np.testing.dec are there to support the nose testing framework, and since we use pytest mainly for testing now, that the decorators there should be deprecated. I created a pull request that does this, and wanted to get the mailing list's opinion before it went forward. Any thoughts, suggestions, or objections? Sent from the Numpy-discussion mailing list archive at Nabble.com. ___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to speed up array generating

2021-01-09 Thread Kevin Sheppard
What about arange and then an integer divide or mod?

Kevin


On Sat, Jan 9, 2021, 20:54  wrote:

> np.meshgrid, indexing, reshape
>
> 09.01.2021, 22:30, "Joseph Fox-Rabinovitz" :
>
> What other ways have you tried?
>
> On Sat, Jan 9, 2021 at 2:15 PM  wrote:
>
> Hello. There is a random 1D array m_0 with size 3000, for example:
>
> m_0 = np.array([0, 1, 2])
>
> I need to generate two 1D arrays:
>
> m_1 = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
> m_2 = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
>
> Is there faster way to do it than this one:
>
> import numpy as npimport time
> N = 3
> m_0 = np.arange(N)
>
> t = time.time()
> m_1 = np.tile(m_0, N)
> m_2 = np.repeat(m_0, N)
> t = time.time() - t
>
> I tried other ways but they are slower or have the same time. Other NumPy 
> operations in my code 10-100 times faster. Why the repeating an array is so 
> slow? I need 10 times speed up. Thank you for your attantion to my problem.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ,
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to speed up array generating

2021-01-09 Thread Kevin Sheppard
Actually I would try a broadcast multiply followed by ravel first.

Kevin


On Sat, Jan 9, 2021, 21:12 Kevin Sheppard 
wrote:

> What about arange and then an integer divide or mod?
>
> Kevin
>
>
> On Sat, Jan 9, 2021, 20:54  wrote:
>
>> np.meshgrid, indexing, reshape
>>
>> 09.01.2021, 22:30, "Joseph Fox-Rabinovitz" :
>>
>> What other ways have you tried?
>>
>> On Sat, Jan 9, 2021 at 2:15 PM  wrote:
>>
>> Hello. There is a random 1D array m_0 with size 3000, for example:
>>
>> m_0 = np.array([0, 1, 2])
>>
>> I need to generate two 1D arrays:
>>
>> m_1 = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
>> m_2 = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
>>
>> Is there faster way to do it than this one:
>>
>> import numpy as npimport time
>> N = 3
>> m_0 = np.arange(N)
>>
>> t = time.time()
>> m_1 = np.tile(m_0, N)
>> m_2 = np.repeat(m_0, N)
>> t = time.time() - t
>>
>> I tried other ways but they are slower or have the same time. Other NumPy 
>> operations in my code 10-100 times faster. Why the repeating an array is so 
>> slow? I need 10 times speed up. Thank you for your attantion to my problem.
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> ,
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.linalg interference with matplot.lib

2021-01-11 Thread Kevin Sheppard
Are you using NumPy <= 1.19.3 and WIndows 2004 or 20H2?  If so, you should
upgrade to NumPy 1.19.5.

Kevin


On Mon, Jan 11, 2021 at 10:51 AM Frederix96  wrote:

> Dear numpy community,
>
> I am having an issue with numpy.linalg where a matrix inversion of a
> complex matrix using numpy.linalg.inv results in containing nan values
> when a plot of any kind was performed beforehand by matplotlib. For the
> specific details on the problem and what I have tried already see the
> stackoverflow post I started a month ago:
>
>
> https://stackoverflow.com/questions/65183984/why-does-plotting-something-with-matplotlib-change-the-first-outcome-of-a-numpy
>
> I know that this problem has to do with some specifics of the maschine I
> am using, as it does not happen on other maschines I tried. However it
> still appears after rebuilding my entire python enviroment. Hence I ran
> out of options appart from setting up my whole PC from scratch again,
> which I would rather not do.
>
> Is some problem like this known in the community?
>
> Has anyone an idea at which other specifications of my system it might
> be worth taking a look at?
>
> Thanks a lot in advance!
>
> Best regards,
>
> Frederix96
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-06 Thread Kevin Sheppard
Have you benchmarked it using the generator interface? The structure of
this as a no monolithic generator makes it a good deal slower than
generating in straight C (with everything inline).  While I'm not sure a
factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I
don't know where the cutoff is).

Can you post benchmarks from using it through Generator?

Also, those tests would be replaced with new values if the patch was
accepted, so don't worry about them.

Kevin


On Sat, Feb 6, 2021, 09:32  wrote:

> I tried to implement a different implementation of the ziggurat method for
> generating standard normal distributions that is about twice as fast and
> uses 2/3 of the memory than the old one.
> I tested the implementation separately and am very confident it's correct,
> but it does fail 28 test in coverage testing.
> Checking the testing code I found out that all the failed tests are inside
> TestRandomDist which has the goal of "Make[ing] sure the random
> distribution returns the correct value for a given seed". Why would this be
> needed?
> The only explanation I can come up with is that it's standard_normal is,
> in regards to seeding, required to be backwards compatible. If that's the
> case how would, could one even implement a new algorithm?
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-08 Thread Kevin Sheppard
If I understand correctly, there is no gain when applying this patch to
Generator.  I'm not that surprised that this is the case since the compiler
is much more limited in when it can do in Generator than what it can when
filling a large array directly with functions available for inlining and
unrolling. Again, if I understand correctly I think it will be difficult to
justify breaking the stream for a negligible gain in performance.

Kevin


On Sat, Feb 6, 2021 at 12:27 PM  wrote:

> Well, I can tell you why it needs to be backward compatible!  I use random
> numbers fairly frequently, and to unit test them I set a specific seed and
> then make sure I get the same answers.
>
> Hmm, I guess that makes sense. I tried to adjust my algorithms to do the
> same thing with the same bit's as the original one, but I couldn't get it
> to work.
>
> Have you benchmarked it using the generator interface? The structure of
> this as a no monolithic generator makes it a good deal slower than
> generating in straight C (with everything inline).  While I'm not sure a
> factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I
> don't know where the cutoff is).
>
>
> I originally benchmarked my implementation against a bunch of other ones
> in c (because I was developing a c library
> https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1928).
> But I did run the built-in benchmark: ./runtests.py --bench
> bench_random.RNG.time_normal_zig and the results are:
>
>   new   old
> PCG64  589±3μs 1.06±0.03ms
> MT19937 985±4μs 1.44±0.01ms
> Philox 981±30μs1.39±0.01ms
> SFC64  508±4μs   900±4μs
> numpy2.99±0.06ms   2.98±0.01ms # no change for /dev/urandom
>
>
> I'm not yet 100% certain about the implementations, but I attached a diff
> of my current progress.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-08 Thread Kevin Sheppard
My reading is that the first 4 are pure C, presumably using the standard practice of inclining so as to make the tightest loop possible, and to allow the compiler to make other optimizations.  The final line is what happens when you replace the existing ziggurat in NumPy with the new one. I read it this way since it has both “new” and “old” with numpy. If it isn’t this, then I’m unsure what “new” and “old” could mean in the context of this thread. I suppose camel-cdr can clarify what was actually done. Kevin  From: Robert KernSent: Monday, February 8, 2021 3:41 PMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] Question about optimizing random_standard_normal On Mon, Feb 8, 2021 at 3:05 AM Kevin Sheppard <kevin.k.shepp...@gmail.com> wrote:If I understand correctly, there is no gain when applying this patch to Generator.  I'm not that surprised that this is the case since the compiler is much more limited in when it can do in Generator than what it can when filling a large array directly with functions available for inlining and unrolling. Again, if I understand correctly I think it will be difficult to justify breaking the stream for a negligible gain in performance. Can you explain your understanding of the benchmark results? To me, it looks like nearly a 2x improvement with the faster BitGenerators (our default PCG64 and SFC64). That may or may not worth breaking the stream, but it's far from negligible.But I did run the built-in benchmark: ./runtests.py --bench bench_random.RNG.time_normal_zig and the results are:   new   oldPCG64  589±3μs 1.06±0.03msMT19937 985±4μs 1.44±0.01msPhilox 981±30μs    1.39±0.01msSFC64  508±4μs   900±4μsnumpy    2.99±0.06ms   2.98±0.01ms # no change for /dev/urandom -- Robert Kern 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about optimizing random_standard_normal

2021-02-08 Thread Kevin Sheppard
That is good news indeed.  Seems like a good upgrade, especially given the breadth of application of normals and the multiple appearances within distributions.c (e.g., Cauchy). Is there a deprecation for a change like this? Or is it just a note and new random numbers in the next major?  I think this is the first time a substantially new algo has replaced an existing one. Kevin  From: Robert KernSent: Monday, February 8, 2021 4:06 PMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] Question about optimizing random_standard_normal On Mon, Feb 8, 2021 at 10:53 AM Kevin Sheppard <kevin.k.shepp...@gmail.com> wrote:My reading is that the first 4 are pure C, presumably using the standard practice of inclining so as to make the tightest loop possible, and to allow the compiler to make other optimizations.  The final line is what happens when you replace the existing ziggurat in NumPy with the new one. I read it this way since it has both “new” and “old” with numpy. If it isn’t this, then I’m unsure what “new” and “old” could mean in the context of this thread. No, these are our benchmarks of `Generator`. `numpy` is testing `RandomState`, which wasn't touched by their contribution.   https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L93-L97  https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L123-L124 I suppose camel-cdr can clarify what was actually done. But I did run the built-in benchmark: ./runtests.py --bench bench_random.RNG.time_normal_zig and the results are: -- Robert Kern 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: Proposal to add KML_BLAS support

2021-02-22 Thread Kevin Sheppard
It is probably best to not send around xlsx files.  Perhaps reformat as a table in an email? Kevin  From: ChunLin FangSent: Monday, February 22, 2021 12:16 PMTo: numpy-discussion@python.orgSubject: [Numpy-discussion] ENH: Proposal to add KML_BLAS support Part of the performance benchmark results. 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Guidelines for floating point comparison

2021-02-24 Thread Kevin Sheppard
In my experience it is most common to use reasonable but not exceedingly tight bounds in complex applications where there isn’t a proof that the maximum error must be smaller than some number.  I would also caution against using a single system to find the tightest tolerance a test passes at.  For example, if you can pass at a rol 1e-13 on Linux/AMD64/GCC 9, then you might want to set a tolerance around 1e-11 so that you don’t get caught out on other platforms. Notoriously challenging platforms in my experience (mostly from statsmodels) are 32-bit Windows, 32-bit Linux and OSX (and I suspect OSX/ARM64 will be another difficult one).  This advice is moot if you have a precise bound for the error. Kevin  From: Ralf GommersSent: Wednesday, February 24, 2021 12:25 PMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] Guidelines for floating point comparison   On Wed, Feb 24, 2021 at 11:29 AM Bernard Knaepen  wrote:Hi all,We are developing a code that heavily relies on NumPy. Some of our regression tests rely on floating point number comparisons and we are a bit lost in determining how to choose atol and rtol (we are trying to do all operations in double precision). We would like to set atol and rtol as low as possible but still have the tests pass if we run on different architectures or introduce some ‘cosmetic’ changes like using different similar NumPy routines.For example, let’s say we want some powers of the matrix A and compute them as:A = np.array(some_array)A2 = np.dot(A, A)A3 = np.dot(A2, A)A4 = np.dot(A3, A)If we alternatively computed A4 like:A4 = np.linalg.matrix_power(A, 4),we get different values in our final outputs because obviously the operations are not equivalent up to machine accuracy.Is there any reference that one could share providing guidelines on how to choose reasonable values for atol and rtol in this kind of situation? For example, does the NumPy package use a fixed set of values for its own development? the default ones? I don't think there's a clear guide in docs or blog post anywhere. You can get a sense of what works by browsing the unit tests for numpy and scipy. numpy.linalg, scipy.linalg and scipy.special are particularly relevant probably. For a rough rule of thumb: if you test on x86_64 and precision is on the order of 1e-13 to 1e-16, then set a relative tolerance 10 to 100 times higher to account for other hardware, BLAS implementations, etc. Cheers,Ralf Thanks in advance for any help,Cheers,Bernard.___NumPy-Discussion mailing listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String accessor methods

2021-03-07 Thread Kevin Sheppard
I think that and string functions that are exposed from an ndarray would
have to be guaranteed to work in-place. Requiring casting to objects to use
the methods feels more like syntactic sugar than an essential case. I think
most of the ones mentioned are low performance and can't take advantage of
the storage as a blob of int8 (ascii) or int32 (utf32) that underlay Numpy
string arrays.

I also think the existence of these in pandas reduces the case for them
being in Numpy.

On Sun, Mar 7, 2021, 05:32 Todd  wrote:

> On Sat, Mar 6, 2021 at 12:57 PM dan_patterson 
> wrote:
>
>> The are  in np.char
>>
>> mystr = np.array(["test first", "test second", "test third"])
>>
>> np.char.title(mystr)
>> array(['Test First', 'Test Second', 'Test Third'], dtype='>
>
> I mentioned those in my email, but they are far less convenient to use
> than class methods, nor do they relate well to how built-in strings are
> used in Python. That is why other projects have started using accessor
> methods and why Python removed all the separate string functions in Python
> 3. The functions in np.char are also limited in their capabilities, and
> fairly poorly documented in my opinion.  Some of those limitations are
> impossible to overcome, for example they inherently can never support
> operators, addition or multiplication, or slicing like Python strings can,
> while an accessor could.
>
> However, putting them as top-level methods for ndarray would pollute the
> methods too much. That is why I am suggesting numpy do the same thing that
> pandas, xarray, etc. are doing and putting those as methods under a 'str'
> attribute for ndarrays rather than as separate functions.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using logfactorial instead of loggamma in random_poisson sampler

2021-03-08 Thread Kevin Sheppard
I did a quick test and using random_loggam was about 6% faster than using
logfactorial (on Windows).

Kevin


On Sun, Mar 7, 2021 at 2:40 AM Robert Kern  wrote:

> On Sat, Mar 6, 2021 at 1:45 PM Warren Weckesser <
> warren.weckes...@gmail.com> wrote:
>
>> At the time, making that change was not a high priority, so I didn't
>> pursue it. It does make sense to use the logfactorial function there,
>> and I'd be happy to see it updated, but be aware that making the
>> change is more work than changing just the function call.
>>
>
> Does it make a big difference? Per NEP 19, even in `Generator`, we do
> weigh the cost of changing the stream reasonably highly. Improved accuracy
> is likely worthwhile, but a minor performance improvement is probably not.
>
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dot + add operation

2021-03-31 Thread Kevin Sheppard
Or just use SciPy's  get_blas_funcs

to access *gemm, which directly exposes this function:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.blas.dgemm.html

Kevin


On Wed, Mar 31, 2021 at 12:35 PM Ralf Gommers 
wrote:

>
>
> On Wed, Mar 31, 2021 at 2:35 AM Guillaume Bethouart <
> guillaume.bethou...@eshard.com> wrote:
>
>> Is it possible to add a method to perform a dot product and add the
>> result to an existing matrix in a single operation ?
>>
>> Like C = dot_add(A, B, C) equivalent to C += A @ B.This behavior is
>> natively proposed by the Blas *gemm primitive.
>>
>> The goal is to reduce the peak memory consumption. Indeed, during the
>> computation of C += A @ B, the maximum allocated memory is twice the size
>> of C.Using *gemm to add directly the result , the maximum memory
>> consumption is less than 1.5x the size of C.
>> This difference is significant for large matrices.
>>
>> Any people interested in it ?
>>
>
> Hi Guillaume, such fused operations cannot easily be done with NumPy
> alone, and it does not make sense to add separate APIs for that purpose
> because there are so many combinations of function calls that one might
> want to fuse.
>
> Instead, Numba, Pythran or numexpr can add this to some extent for numpy
> code. E.g. search for "loop fusion" in the Numba docs.
>
> Cheers,
> Ralf
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] two questions about `choose`

2021-04-17 Thread Kevin Sheppard
1. I suppose it only uses the (Native int or int64) dtype since each one
would need a code path to run quickly.

2. I would describe this a a bug. I think sequences are converted to arrays
and in this case the conversion is not returning a 2 element object array
but expanding and then concatenation.

Kevin


On Sat, Apr 17, 2021, 18:56 Alan G. Isaac  wrote:

> 1. Is there a technical reason for `choose` not accept a `dtype` argument?
>
> 2. Separately, mypy is unhappy with my 2nd argument to `choose`:
> Argument 2 to "choose" has incompatible type "Tuple[int, Sequence[float]]";
> expected "Union[Union[int, float, complex, str, bytes, generic],
> Sequence[Union[int, float, complex, str, bytes, generic]],
> Sequence[Sequence[Any]],_SupportsArray]"
> However, `choose` is happy to have e.g. `choices=(0,seq)` (and I hope it
> will remain so!).
>
> E.g.,
> a = a = (0,1,1,0,0,0,1,1)  #binary array
> np.choose((0,range(8)) #array([0, 1, 2, 0, 0, 0, 6, 7])
>
> Thanks, Alan Isaac
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] two questions about `choose`

2021-04-18 Thread Kevin Sheppard
Oh. I answered thinking about choice and not choose. Please ignore both
parts.

On Sun, Apr 18, 2021, 17:56 Robert Kern  wrote:

> On Sat, Apr 17, 2021 at 4:28 PM Kevin Sheppard 
> wrote:
>
>> 1. I suppose it only uses the (Native int or int64) dtype since each one
>> would need a code path to run quickly.
>>
>> 2. I would describe this a a bug. I think sequences are converted to
>> arrays and in this case the conversion is not returning a 2 element object
>> array but expanding and then concatenation.
>>
>
> No, it's broadcasting the two to a common shape, as documented.
> https://numpy.org/doc/stable/reference/generated/numpy.choose.html
>
>
>> E.g.,
>>> a = a = (0,1,1,0,0,0,1,1)  #binary array
>>> np.choose(a, (0,range(8)) #array([0, 1, 2, 0, 0, 0, 6, 7])
>>>
>>
> This is equivalent to `np.choose(a, (np.zeros(8, dtype=int), range(8)))`
>
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bad CRC errors when using np.savez, only sometimes though!

2021-05-14 Thread Kevin Sheppard
You could use pandas if your arrays are 2d to serialize using parquet or HDF.  If your arrays are 3d, you could use xarray and NetCDF. I think in both cases if your arrays are standard dtypes then you should be able to avoid copies. Both support compression. Kevin  From: Isaac GergSent: Friday, May 14, 2021 3:22 PMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] bad CRC errors when using np.savez, only sometimes though! Hi Ben,  I am not sure.  However, in looking at the dates, it looks like that was fixed in scipy as of 2019.   Would you recommend using the scipy save interface as opposed to the numpy one? On Fri, May 14, 2021 at 10:16 AM Benjamin Root  wrote:Perhaps it is a similar bug as this one? https://github.com/scipy/scipy/issues/6999 Basically, it turned out that the CRC was getting computed on an unflushed buffer, or something like that. On Fri, May 14, 2021 at 10:05 AM Isaac Gerg  wrote:I am using 1.19.5 on Windows 10 using Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]. I have two python processes running (i.e. no threads) which do independent processing jobs and NOT writing to the same directories.  Each process runs for 5-10 hours and then writes out a ~900MB npz file containing 4 arrays. When I go back to read in the npz files, I will sporadically get bad CRC errors which are related to npz using ziplib.  I cannot figure out why this is happening.  Looking through online forums, other folks have had CRC problems but they seem to be isolated to specifically using ziblib, not numpy.  I have found a few mentions though of ziplib causing headaches if the same file pointer is used across calls when one uses the file handle interface to ziblib as opposed to passing in a filename.' I have verified with 7zip that the files do in fact have a CRC error so its not an artifact of the ziblib.  I have also used the file handle interface to np.load and still get the error. Aside from writing my own numpy storage file container, I am stumped as to how to fix this, or reproduce this in a consistent manner.  Any suggestions would be greatly appreciated! Thank you,Isaac___NumPy-Discussion mailing listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion___NumPy-Discussion mailing listNumPy-Discussion@python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Interface for wrapping random distribution functions (__array_function__, __ufunc__, ?)

2021-06-30 Thread Kevin Sheppard
There is an issue that suggests reimplementing many of the Generator
methods as ufuncs so they they would inherit all of the free features of
ufuncs. It seems like a doable but large project for some useful but
nonessential features.

Kevin


On Wed, Jun 30, 2021, 20:49 Robert Kern  wrote:

> On Wed, Jun 30, 2021 at 3:34 PM Yoann Mocquin  wrote:
>
>>
>>
>> Hello there,
>>
>> here is a feature request for the possibility to wrap numpy random
>> distributions to be wrapped by a mechanism like __array_function__ or
>> __array_ufunc__. You can find the GH issue at :
>> https://github.com/numpy/numpy/issues/19382.
>>
>>
>>
>> This post follows [this one](https://github.com/numpy/numpy/issues/18902).
>> I figured I should open another issue for `np.random.normal`, but this
>> applies for all distributions I guess.
>>
>> ## Feature
>>
>> Basicaly, I would like that `numpy.random.*` distributions could trigger
>> an interface when passed instances from custom classes, "à la"
>> `__array_ufunc__` or `__array_function__`. Here is an example concept :
>>
>
>  The main problem is that these are not functions but methods on a hidden
> global `RandomState` instance. I haven't kept up fully with all of the
> efforts in the `__array_*__` proposals, but I do see that methods are
> explicitly excluded from the `__array_function__` mechanism:
>
>   https://numpy.org/neps/nep-0018-array-function-protocol.html#non-goals
>
> The `RandomState` functionality (and thus all of these aliases in
> `numpy.random`) are now frozen in functionality (per NEP 19), so we will
> not be adding this functionality to them. If the `__array_function__`
> developments eventually work out a good way to wrap methods, then we can
> think about using that on `Generator`, but I suspect that will not be
> straightforward.
>
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reducing effort spent on wheel builds?

2021-07-15 Thread Kevin Sheppard
When thinking about supporting a platform, it seems reasonable to consider other sources for pre-compiled binaries, e.g., conda and especially conda-forge. Conda-forge has linux-ppc64le  v1.7.0linux-64  v1.7.0linux-aarch64  v1.7.0osx-arm64  v1.7.0osx-64  v1.7.0win-32  v1.2.1win-64  v1.7.0 If this list included PyPy that it would be virtually complete. It seems reasonable to direct users to conda-forge if they are unable to build a package for a less common platform. Building on conda-forge is often easier than building a wheel for PyPI due to the substantial infrastructure that the conda-forge team provides to automatically wire up CI for a lot of platforms. Kevin  From: Ralf GommersSent: Thursday, July 15, 2021 2:12 PMTo: Discussion of Numerical PythonSubject: Re: [Numpy-discussion] reducing effort spent on wheel builds?   On Thu, Jul 15, 2021 at 2:41 PM Matti Picus  wrote:On 15/7/21 1:21 pm, Ralf Gommers wrote:> Hey all,>> I've seen Chuck and Matti burn a lot of time on the numpy-wheels repo > again recently, and I've done the same for SciPy. ...>> Cheers,> RalfSince my name was mentioned, the things I have spent time on for wheel packaging in order of time spent (not scientifically measured rather looking back at the closed PRs and thinking "oh yeah, that was painful") have been- Updating manylnux to move away from 10 year old glibc on linux (still stuck and not clear how to finish it [0])- Moving from travis-ci.org to travis-ci.com (with the panic around build credits) and from Appveyor to github actions/azure pipelines- Moving from rackspace's wheel hosting to anaconda.org- Working around CI failures with aarch64 for linux, mainly due to shortcomings in the free CI providers- Bugs in dependencies: avoiding buggy Accelerate when building NumPy and debugging Windows/aarch64 problems with OpenBLAS- Updating OpenBLAS versions- Shepherding Apple M1 hardware through the manylinux/multibuild/wheel pipeline (all the hard work was done by others)- Trying to make 64-bit interfaces for OpenBLAS work (99.9% done by Chuck)- Updating PyPy versions Thanks Matti! This list is super helpful. Only the last one, which was actually the least painful, would be helped by Ralf's list. Not so sure about that - probably the single biggest pain points are CI providers (especially the exotic ones) and OpenBLAS - a dependency we struggle with mostly because of wheels. Without ppc64le and s390x (forgot that one) we wouldn't need TravisCI at all for example, and we would have less work on https://github.com/MacPython/openblas-libs too. On the other hand, packaging is made harder as more technologies go into a wheel build. The twitter thread started with "SciPy added a required dependency on a technology which broke things, but people stepped up to fix the problem quickly" and morphed into "lets drop wheels for lesser used platforms". I am not sure the discussion should have moved away from the first point so quickly. Perhaps there should be some discussion of the cost of adding new build dependencies and somehow making the dependency conditional for a release or two until all the worst kinks are worked out. Not quite the right mailing list, but: it *is* an optional dependency in scipy 1.7.0, and there has been no proposal so far to make it required.  For the record, I am +1 on removing sdists from PyPI until pip changes its default to --only-binary :all: [1]Matti[0] https://github.com/pypa/manylinux/issues/1012 Ah yes:( That just seems like a major oversight in the perennial manylinux spec. If no one fixes it, guess we wait till we can jump to `_2_??` with a higher GCC requirement. Cheers,Ralf [1] https://github.com/pypa/pip/issues/9140 
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Re: Running numpy.test() after pip install

2021-09-28 Thread Kevin Sheppard
I think NumPy has always tried to have effectively no install dependencies at least when installed from source (minimal install from git only requires Cython).  This seems like a good goal to me and there are scenarios where extra packages are a burden (e.g., Amazon Lambda).  AFAICT pretty much every package in the numerical python ecosystem excludes test dependencies from setup requires or install requires. Kevin  From: Bennet FauberSent: Tuesday, September 28, 2021 6:00 PMTo: numpy-discussion@python.orgSubject: [Numpy-discussion] Running numpy.test() after pip install I just installed NumPy using pip and Python 3.9.7 on CentOS 7.9.Installation went fine, but when I ran >>> import numpy>>> numpy.test() I got a traceback, and this summary. === short test summary info ERROR  - ModuleNotFoundError: No module named 'hypothesis' Interrupted: 1 error during collection  It seems that both hypothesis and pytest are needed to run numpy.test(). We like to be able to run all the tests even after a pip install, andbeing able to run some of the tests from pip-installed numpy is usefulas well. May I suggest you add those two packages as dependencies for NumPy atPyPi?  Neither are particularly large, both are generally useful, andI think the potential utility for those who would ike to check ontheir own system for anomalies outweighs the minimal impact on themajority of users who would not use it but would also probably notnotice their presence. If adding them as dependencies seems to heavy weight, or is otherwiseddeemed undesirable, perhaps just a note in the Project Description athttps://pypi.org/project/numpy/ to say that, if you want to run tests,those two packages will be needed? Thanks,    -- bennet___NumPy-Discussion mailing list -- numpy-discussion@python.orgTo unsubscribe send an email to numpy-discussion-le...@python.orghttps://mail.python.org/mailman3/lists/numpy-discussion.python.org/Member address: kevin.k.shepp...@gmail.com 
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: What happened to the numpy.random documentation?

2021-10-14 Thread Kevin Sheppard
I think the issue in random specifically is that a raw list of
available functions does not provide suitable guidance for someone looking
for random variate generating function.  This is because the module-level
API is mostly dominated by methods of the singleton RandomState instance.
Best practice going forward is to use the methods of a Generator instance,
most likely provided by default_rng(). A simple API-list will not be able
to provide this guidance.

FFT has a very simple API and so a simple list make sense.  Similarly,
np.random before the generation was revamped, which is hy the old-style was
adequate for <=1.16, but not for >=1.17

Kevin


On Thu, Oct 14, 2021 at 6:09 PM Paul M.  wrote:

> Hi Melissa,
>
> I think that's the right approach.  Looking through the current docs, I
> think the page on the FFT module is exemplary in this regard:
>
> https://numpy.org/doc/stable/reference/routines.fft.html
>
> It lists all the available functions (with links to details), and then has
> a section on "Background Information", "Implementation Details", etc.  It's
> easy to get a quick overview of what the available functions are, and then
> ease into the background info in terms of how it works.
>
> Cheers,
> Paul
>
>
> On Thu, Oct 14, 2021 at 12:44 PM Melissa Mendonça 
> wrote:
>
>> Hi Paul,
>>
>> Do you think having a page with the flat list of routines back, in
>> addition to the explanations, would solve this?
>>
>> - Melissa
>>
>> On Thu, Oct 14, 2021 at 1:34 PM Paul M.  wrote:
>>
>>> Hi All,
>>>
>>> The documentation of Numpy's submodules  used to have a fairly standard
>>> structure as shown here in the 1.16 documentation:
>>>
>>>   https://docs.scipy.org/doc/numpy-1.16.1/reference/routines.random.html
>>>
>>> Now the same page in the API documentation looks like this:
>>>
>>>   https://numpy.org/doc/stable/reference/random/index.html
>>>
>>> While I appreciate the expository text in the new documentation about
>>> how the generators work, this new version is much less useful as a
>>> reference to the API.  It seems like it might fit better in the user manual
>>> rather than the API reference.
>>>
>>> From my perspective it seems like the new version of the documentation
>>> is harder to navigate in terms of finding information quickly (more
>>> scrolling, harder to get a bird's eye view of functions in various
>>> submodules, etc).
>>>
>>> Has anyone else had a similar reaction to the changes? I teach a couple
>>> of courses in scientific computing and bioinformatics and my students seem
>>> to also struggle to get a sense of what the different modules offer based
>>> on the new version of the documentation. For now, I'm referring them to the
>>> old (1.70) reference manuals as a better way to get acquainted with the
>>> libraries.
>>>
>>> Cheers,
>>> Paul Magwene
>>> ___
>>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>>> To unsubscribe send an email to numpy-discussion-le...@python.org
>>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>>> Member address: meliss...@gmail.com
>>>
>> ___
>> NumPy-Discussion mailing list -- numpy-discussion@python.org
>> To unsubscribe send an email to numpy-discussion-le...@python.org
>> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
>> Member address: pmma...@gmail.com
>>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: kevin.k.shepp...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: What happened to the numpy.random documentation?

2021-10-15 Thread Kevin Sheppard
On Thu, Oct 14, 2021 at 7:22 PM Ralf Gommers  wrote:

>
>
> On Thu, Oct 14, 2021 at 7:19 PM Kevin Sheppard 
> wrote:
>
>> I think the issue in random specifically is that a raw list of
>> available functions does not provide suitable guidance for someone looking
>> for random variate generating function.  This is because the module-level
>> API is mostly dominated by methods of the singleton RandomState instance.
>> Best practice going forward is to use the methods of a Generator instance,
>> most likely provided by default_rng(). A simple API-list will not be able
>> to provide this guidance.
>>
>
> The list can be annotated with headings and one-line or one-paragraph
> descriptions, something like:
>
> ```
> ## Generator interface
>
> This is the recommended interface ... 
>
> ## Interface for unit testing and legacy code
>
> 
> ```
>
> The complaint is very much valid here, I have made the same one before.
> The way the page currently is written makes little sense - it addresses a
> user transitioning from the old to the new interface, or explicitly
> comparing the two for some reason. To a user just looking for information
> on NumPy today, that's more confusing than helpful.
>
> The page also talks about "The new interface", "What's new and different",
> "Some long-overdue API cleanup", and "Since Numpy version 1.17.0" - that
> all belongs in a NEP, and not in the API reference docs.
>
> Cheers,
> Ralf
>
>
>
I don't think the doc style there is ideal.  I would still say that a
relatively naive dump of `np.random` (something that would be everything in
[v for v in dir(np.random) if not v.startswith("_")] would not lead to an
idea set of docs because most of the "obvious" functions are methods of the
legacy RandomState.  A good set would need something like (excluding
headers)

default_rng
Generator
Generator.random
Generator.integers
...


Legacy Methods
==

np.random.random_sample
np.random.randint
RandmState
...

IMO many (likely most) methods exposed in np.random should not be on the
default landing page for np.random.

Best,
Kevin



>> FFT has a very simple API and so a simple list make sense.  Similarly,
>> np.random before the generation was revamped, which is hy the old-style was
>> adequate for <=1.16, but not for >=1.17
>>
>> Kevin
>>
>>
>> On Thu, Oct 14, 2021 at 6:09 PM Paul M.  wrote:
>>
>>> Hi Melissa,
>>>
>>> I think that's the right approach.  Looking through the current docs, I
>>> think the page on the FFT module is exemplary in this regard:
>>>
>>> https://numpy.org/doc/stable/reference/routines.fft.html
>>>
>>> It lists all the available functions (with links to details), and then
>>> has a section on "Background Information", "Implementation Details", etc.
>>> It's easy to get a quick overview of what the available functions are, and
>>> then ease into the background info in terms of how it works.
>>>
>>> Cheers,
>>> Paul
>>>
>>>
>>> On Thu, Oct 14, 2021 at 12:44 PM Melissa Mendonça 
>>> wrote:
>>>
>>>> Hi Paul,
>>>>
>>>> Do you think having a page with the flat list of routines back, in
>>>> addition to the explanations, would solve this?
>>>>
>>>> - Melissa
>>>>
>>>> On Thu, Oct 14, 2021 at 1:34 PM Paul M.  wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> The documentation of Numpy's submodules  used to have a fairly
>>>>> standard structure as shown here in the 1.16 documentation:
>>>>>
>>>>>
>>>>> https://docs.scipy.org/doc/numpy-1.16.1/reference/routines.random.html
>>>>>
>>>>> Now the same page in the API documentation looks like this:
>>>>>
>>>>>   https://numpy.org/doc/stable/reference/random/index.html
>>>>>
>>>>> While I appreciate the expository text in the new documentation about
>>>>> how the generators work, this new version is much less useful as a
>>>>> reference to the API.  It seems like it might fit better in the user 
>>>>> manual
>>>>> rather than the API reference.
>>>>>
>>>>> From my perspective it seems like the new version of the documentation
>>>>> is harder to navigate in terms of finding information quickly (more
>>>>> scrolling, harder to get a bir

[Numpy-discussion] Re: Consistency of random number functions APIs.

2021-11-01 Thread Kevin Sheppard
> Some NumPy random number generation functions take a dtype parameter
> whereas others don't.  Some of them take an out parameter whereas others
> don't.  Just glancing at it, there seems to be no rhyme or reason why this
> would be the case but is there some hidden consistency underneath the hood
> to explain why some have these params and others don't?  Is there any
> reason that things like random.randn and numpy.random.Generator.normal
> don't take a dtype and out parameters?  If I need to create a huge array of
> random numbers whose dtype is float16 or float32 then what is the BKM to do
> this when the routine I would like to use generates an array of float64 and
> with the 64-bit data type the array won't fit in memory?
>
>
There is definitely inconsistency.  The out keyword was added to a core set
of floating generators as part of a feature request to the project that
became Generator.  The dtype argument was the same. integers already had
dtype support so it was different from the core generators that have 32-bit
implementations. The floating generators that have `dtype` support all use
bit-efficient methods internally and so are not just `astype(np.float32)`
wrappers around the double-precision implementations.

IMO a redesign of the internals would be needed to support `out` and
`dtype`. There is an issue about rewriting variate construction functions
as ufuncs so that one would get all of these for free.  This would need a
dedicated individual to undertake it since it would not be possible (as far
as I know) to do this using Cython.  This would also raise the maintenance
bar for many users. Another possible approach would be to use C++ and
templates now that this is moving into (slowly) mainstream.

In the meantime, Generator is meant to be extensible using either compiled
code (C/Cython) or JIT code (Numba), so that you can pretty easily produce
fast, consistent generation code across a wide range of use cases.

Finally as noted previously, all of the generators in the np.random
namespace, which are methods of a RandomState object, are forever frozen in
their API (per NEP, and also effectively frozen in implementation as well).
The only place for changes is np.random.Generator (or any successor).

Kevin
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal: Automatic estimation of number of histogram bins for weighted data

2021-12-13 Thread Kevin Sheppard
To me, this feels like it might be a better fit for SciPy or possibly
statsmodels (but maybe not since neither have histogram functions
anymore).The challenge with weighted estimators is how the weights should
be interpreted. Stata covers the most important cases of weights
https://www.reed.edu/psychology/stata/gs/tutorials/weights.html.  Would
these be frequency weights?  Stata supports only frequency weights
https://www.stata.com/manuals/u11.pdf#u11.1.6weight.

Kevin


On Sun, Dec 12, 2021 at 9:45 AM Jonathan Crall  wrote:

> Hi all, this is my first post on this mailing list.
>
> I'm writing to propose a method for extending the histogram bandwidth
> estimators to work with weighted data. I originally submitted this proposal
> to seaborn: https://github.com/mwaskom/seaborn/issues/2710 and mwaskom
> suggested I take it here.
>
> Currently the unweighted auto heuristic is a combination of
> the Freedman-Diaconis and Sturges estimator. For reference, these rules are
> as follows:
>
> Sturges: return the peak-to-peak ptp=(i.e. x.max() - x.min()) and number
> of data points total=x.size. Then divide ptp by the log of one plus the
> number of data points.
>
> ptp / log2(total + 2)
>
> Freedman-Diaconis: Find the interquartile-range of the data
> iqr=(np.subtract(*np.percentile(x, [75, 25]))) and the number of data
> points total=x.size, then apply the formula:
>
> 2.0 * iqr * total ** (-1.0 / 3.0).
>
> Taking a look at these it seems (please correct me if I'm missing
> something that makes this not work) that there is a simple extension to
> weighted data. If we can find a weighted replacement for p2p, total, and
> iqr, the formulas should work exactly the same in the weighted case.
>
> The p2p case seems easy. Even if the data points are weighed, that doesn't
> change the min and max. Nothing changes here.
>
> For total, instead of taking the size of the array (which implicitly
> assumes each data point has a weight of 1), just sum the weight to get
> total=weights.sum().
>
> I believe the IQR is also computable in the weighted case.
>
> import numpy as np
> n = 10
> rng = np.random.RandomState(12554)
>
>
> x = rng.rand(n)
> w = rng.rand(n)
>
>
> sorted_idxs = x.argsort()
> x_sort = x[sorted_idxs]
> w_sort = w[sorted_idxs]
>
>
> cumtotal = w_sort.cumsum()
> quantiles = cumtotal / cumtotal[-1]
> idx2, idx1 = np.searchsorted(quantiles, [0.75, 0.25])
> iqr_weighted = x_sort[idx2] - x_sort[idx1]
> print('iqr_weighted = {!r}'.format(iqr_weighted))
>
>
> # test this is the roughtly the same for the "unweighted case"
> # (wont be exactly the same because this method does not have
> interpolation)
> w = np.ones_like(x)
>
>
> w_sort = w[sorted_idxs]
> cumtotal = w_sort.cumsum()
> quantiles = cumtotal / cumtotal[-1]
> idx2, idx1 = np.searchsorted(quantiles, [0.75, 0.25])
> iqr_weighted = x_sort[idx2] - x_sort[idx1]
> iqr_unweighted_repo = x_sort[idx2] - x_sort[idx1]
> print('iqr_unweighted_repo = {!r}'.format(iqr_unweighted_repo))
>
>
> iqr_unweighted_orig = np.subtract(*np.percentile(x, [75, 25]))
> print('iqr_unweighted_orig = {!r}'.format(iqr_unweighted_orig))
>
>
> This quick and dirty method if weighted quantiles give a close result
> (which is probably fine for a bandwidth estimator):
>
> iqr_weighted = 0.21964093625695036
> iqr_unweighted_repo = 0.36649977003903755
> iqr_unweighted_orig = 0.30888312408540963
>
> And I do see there is an open issue / PR for
> weighted quantiles/percentiles: https://github.com/numpy/numpy/issues/8935
>  https://github.com/numpy/numpy/pull/9211 so this code could make use of
> that after it lands.
>
> Lastly, I think the most common case (or at least my case) for using a
> weighted histogram is to combine multiple histograms. In this case the
> number of estimated bins might be greater than the number of weighted data
> points, and a simple min condition on that number and the estimated number
> of bins should take care of that.
>
> Please let me know: thoughts / opinions / ideas on this topic. I did do
> some searching for related discussion, but I may have missed it, so point
> me to that if I missed it. Also if the reason this feature does not exist
> is because there is some theoretical problem with estimating bandwidth for
> weighted data that I'm unaware of, I'd be interested to learn about that
> (although I can't see that being the case because these are just heuristics
> after all, and I have validated that this works well in my own use-cases).
>
> --
> -Dr. Jon Crall (him)
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: kevin.k.shepp...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/l

[Numpy-discussion] Re: MacPython travis funds have run out (again)

2022-01-14 Thread Kevin Sheppard
As a data point against relying on QEMU: statsmodels times out (5+ hours)
when trying to build with QEMU rising cibuildwheel and GitHub actions.

Kevin


On Fri, Jan 14, 2022, 08:53 Matti Picus  wrote:

>
> On 14/1/22 3:04 am, Charles R Harris wrote:
> > Hi All,
> >
> > Travis wheel build problems again. There is still a positive balance,
> > but apparently it is insufficiently positive. Which is strange if
> > ARM64 is free. Wheel builds have been failing since last weekend.
> >
> > Matti, could you ask for more credits? I'm thinking we really need to
> > move off travis for ARM builds.
> >
> > Chuck
> >
>
> I sent a request to supp...@travis-ci.com.
>
>
> If we want to move off travis for aarch64, I think the only option right
> now is to use emulation, conda-forge does this on Azure machines [0] and
> the build + tests takes ~40 minutes. I can only imagine how long it
> would take to do move SciPy to a QEMU.
>
>
> Do we have a way to reach out to an industry contact in the ARM
> foundation or to AWS (who have aarch 64 cloud machines) or NVidia (who
> also manufacture aarch64 machines) and discuss the problem with them?
>
> Matti
>
>
> [0]
>
> https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=434709&view=logs&jobId=e9afaa34-1a0f-534e-78fc-fae528ccd915&j=e9afaa34-1a0f-534e-78fc-fae528ccd915&t=48395e60-bc5f-5e45-7455-d404f0c95f3a
>
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: kevin.k.shepp...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: MacPython travis funds have run out (again)

2022-01-14 Thread Kevin Sheppard
Thanks for asking, and prompting me to take a look – it turns out the aarch64 built fine: cp310-manylinux_aarch64 finished in 2784.95s What times out was: Building cp310-musllinux_aarch64 wheel Which would have needed to build NumPy, then SciPy, Cython, then Statsmodels.   Now we can have aarch64 wheel one I block musllinux_aarch64. Kevin  From: Matti PicusSent: Friday, January 14, 2022 10:31 AMTo: numpy-discussion@python.orgSubject: [Numpy-discussion] Re: MacPython travis funds have run out (again)   > On Fri, Jan 14, 2022, 08:53 Matti Picus > > wrote:> > > ...> If we want to move off travis for aarch64, I think the only option> right> now is to use emulation, conda-forge does this on Azure machines> [0] and> the build + tests takes ~40 minutes. I can only imagine how long it> would take to do move SciPy to a QEMU.> > > ...> Matti> > > [0]> https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=434709&view=logs&jobId=e9afaa34-1a0f-534e-78fc-fae528ccd915&j=e9afaa34-1a0f-534e-78fc-fae528ccd915&t=48395e60-bc5f-5e45-7455-d404f0c95f3a> > > On 14/1/22 12:08 pm, Kevin Sheppard wrote:>> As a data point against relying on QEMU: statsmodels times out (5+ >> hours) when trying to build with QEMU rising cibuildwheel and GitHub >> actions.>> >> Kevin  Interesting. It seems to work on the azure builds from conda-forge, I wonder what is different. Are the time out build logs still around? Matti  [1] https://github.com/conda-forge/statsmodels-feedstock/runs/4196021348 ___NumPy-Discussion mailing list -- numpy-discussion@python.orgTo unsubscribe send an email to numpy-discussion-le...@python.orghttps://mail.python.org/mailman3/lists/numpy-discussion.python.org/Member address: kevin.k.shepp...@gmail.com 
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: seed for new random number generator

2022-06-19 Thread Kevin Sheppard
On Sun, Jun 19, 2022 at 4:36 PM Robert Kern  wrote:

> On Sun, Jun 19, 2022 at 9:37 AM Pieter Eendebak 
> wrote:
>
>> Hi everyone,
>>
>> The new numpy random interface (e.g. r=numpy.random.default_rng;
>> r.random) is much faster than the old one (e.g. np.random.random). When
>> converting  code from the old style to the new style I miss having a way to
>> set the seed of the RNG
>>
>> I tried:
>>
>> rng.bit_generator = np.random.PCG64(seed=42) # fails, with good error
>> message
>> rng.bit_generator.state['state']['state']=42 # has no effect, perhaps
>> make this dict read-only?
>>
>> Is there a way to set the seed without creating a new RNG object?
>>
>
> We generally recommend just creating a new Generator and passing that
> around in almost all cases. Whenever that can possibly be made to work,
> please do that. The use of np.random.seed() is usually a code smell
> indicating that someone was working around the fact that there was just the
> one global underneath np.random.random() et al. When you don't have the
> constraint of a single global instance, it's almost always going to be
> better to use multiple instances; you don't need that workaround anymore.
>
> There are ways to punch in a new BitGenerator into an existing Generator,
> if you must, and also ways to punch in a state dict into the BitGenerator,
> but these are largely for various meta-programming tasks like serialization
> rather than day-to-day use of pseudorandom numbers. If you are converting
> old code that happened to use np.random.seed(), it is almost certainly not
> one of these tasks.
>
> If you want to describe your use case more specifically, I can give you
> some more guidance on good patterns to replace it with the new system.
>
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: kevin.k.shepp...@gmail.com


At some point we had a seed implementation on the BitGenerator. This was
benchmarked against creating a fresh BitGenerator and there was no
measurable performance gain to the seed() method over creating a fresh
BitGenerator. If you want to reseed, the simplest method is to simply call
default_rng again.

import numpy as np

SEED = 382193802183092174983

rg = np.random.default_rng(SEED)
rg.standard_normal((1000,100))
# Do stuff
rg = np.random.default_rng(SEED)
rg.standard_normal((1000,100))

Kevin
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Setting C and linker flags for Windows build

2022-06-29 Thread Kevin Sheppard
>
>
> Hi,
>
> I am very sorry - I feel I should know this, or be able to work it
> out, but is there a way of setting the flags to the C compiler and the
> linker, for the Numpy build, on Windows?
>
> I'm trying to set the flags for a build with Windows mingw-w64 - but I
> believe Numpy is ignoring $env:LDFLAGS, $env:CFLAGS and $env:OPT - and
> I can't see any way of setting these options from the command line.
> Am I missing something?
>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: kevin.k.shepp...@gmail.com


I think these are all hard coded and non-changable.  This was the case when
I got NumPy building with clang-cl.

Kevin
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Representation of NumPy scalars

2022-09-09 Thread Kevin Sheppard
+1 from me. They are a frequent source of confusion when starting, and
there appear to be far fewer now then in earlier releases.   It also might
make it easier to spot any inadvertent scalars coming out if these could be
Python floats.

Kevin

On Fri, Sep 9, 2022, 07:23 Stefan van der Walt  wrote:

> I am in favor of such a change. It will make what is returned more
> transparent to users (and reduce confusion for newcomers).
>
> With NEP50, we're already adopting a philosophy of explicit scalar usage
> anyway: no longer pretending or trying to make transparent that Python
> floats and NumPy floats are the same.
>
> No one *actually* round-trips objects via repr, but if a user could look
> at a result and know how to construct the object, that is an improvement.
>
> Stéfan
>
> On Thu, Sep 8, 2022, at 22:26, Matti Picus wrote:
> > On 9/9/22 04:15, Warren Weckesser wrote:
> >> ...
> >> To quote from https://docs.python.org/3/library/functions.html#repr:
> >>
> >>> For many types, this function makes an attempt to return a string
> >>> that would yield an object with the same value when passed to eval();
> >> Sebastian, is this an explicit goal of the change?  (Personally, I've
> >> gotten used to not taking this too seriously, but my world view is
> >> biased by the long-term use of NumPy, which has never followed this
> >> guideline.)
> >>
> >> If that is a goal, than the floating point types with precision
> >> greater than double precision will need to display the argument of the
> >> type as a string.  For example, the following is run on a platform
> >> where numpy.longdouble is extended precision (80 bits):
> >>
> >> ```
> >> In [161]: longpi = np.longdouble('3.14159265358979323846')
> >>
> >> In [162]: longpi
> >> Out[162]: 3.1415926535897932385
> >>
> >> In [163]: np.longdouble(3.1415926535897932385)  # Argument is parsed
> >> as 64 bit float
> >> Out[163]: 3.141592653589793116
> >>
> >> In [164]: np.longdouble('3.1415926535897932385')  # Correctly
> >> reproduces the longdouble
> >> Out[164]: 3.1415926535897932385
> >> ```
> >>
> >> Warren
> >
> >
> > As others have mentioned, the change will greatly enhance UX at the cost
> > of documentation cleanups. While the representation may not be perfectly
> > roundtrip-able, I think it still is an improvement and worthwhile.
> > Elsewhere I have suggested we need more documentation around
> > array/scalar printing, perhaps that would be a place to mention the
> > limitations of string representations.
> >
> > Matti
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: kevin.k.shepp...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: How to avoid this memory copy overhead in d=a*b+c?

2022-09-16 Thread Kevin Sheppard
You can use inplace operators where appropriate to avoid memory allocation. a *= bc += a Kevin  From: 腾刘Sent: Friday, September 16, 2022 8:35 AMTo: Discussion of Numerical PythonSubject: [Numpy-discussion] How to avoid this memory copy overhead in d=a*b+c? Hello everyone, I 'm here again to ask a naive question about Numpy performance. As far as I know, Numpy's vectorization operator is very effective because it utilizes SIMD instructions and multi-threads compared to index-style programming (using a "for" loop and assigning each element with its index in array). I 'm wondering how fast Numpy could be so I did some experiments. Take this simple task as an example:    a = np.random.rand(10 000 000)    b = np.random.rand(10 000 000)    c = a + b To check the performance, I wrote a simple C++ implementation of adding two arrays using multi-threads too (with the compile options of: -O3 -mavx2). I found that the C++ implementation is slightly faster than Numpy (running 100 times each to get a rather convincing statistic). Here comes the first question, how come there is this efficiency gap?I guess this is because Numpy needs to load the shared object and find the wrapper of ufunc and then finally execute the underlying computation. Am I right? Am I missing something here? Then I did another experiment for this statement:  d = a * b + c , where a, b, c and d are all numpy arrays. I also use C++ to implement this logic and found that C++ is 2 times faster than Numpy on average (also executed 100 times each). I guess this is because in python we first calculate:    temporary_var = a * band then:    d = temporary_var + cso we have an unnecessary memory transfer overhead. Since each array is very large,  Numpy needs to write temporary_var to memory and then read it back to cache. However in C++ we could just write d[i] = a[i] * b[i] + c[i] and we won't create a temporary array along with the memory transfer penalty. So another problem is if there is a method to avoid this kind of overhead? I 've learned that in Numpy we could create our own ufunc with: frompyfunc, but it seems that there is no SIMD optimization nor multi-threads utilized since this is 100 times slower than "d = a * b + c" way.   
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: How to avoid this memory copy overhead in d=a*b+c?

2022-09-16 Thread Kevin Sheppard
Have a look at numexpr (https://github.com/pydata/numexpr).  It can achieve
large speedups in ops like this at the cost of having to write expensive
operations as strings, e.g., d = ne.evaluate('a * b + c').  You could also
write a gufunc in numba that would be memory and access efficient.

Kevin


On Fri, Sep 16, 2022 at 8:53 AM 腾刘 <27rabbi...@gmail.com> wrote:

> Thanks a lot for answering this question but I still have some
> uncertainties.
>
> I 'm trying to improve the time efficiency as much as possible so I 'm not
> mainly worried about memory allocation, since in my opinion it won't cost
> much.
> Instead, the memory accessing is my central concern because of the cache
> miss penalty.
>
> In your snippet, there will be 4 accesses to the whole arrays, which is:
>
> the access to a (in a *= b)
> the access to b (in a *= b)
> the access to c (in c += a)
> the access to a (in c += a)
>
> This is better than d = a * b + c, but I truly need a new array d to hold
> the final result because I don't want to spoil the data in array c also.
>
> So let's replace c += a with d = a + c, and in this way there will be 5
> accesses to the whole array in total.
>
> However, under optimal conditions, which can be achieved by C++
> implementation ( d[i] = a[i] * b[i] + c[i]), we only need four accesses to
> the whole array.
>
> In modern CPU, the cost of this kind of simple calculation is negligible
> compared to memory access, so I guess I still need a better way.
>
> So much thanks again for your reply!
>
> Kevin Sheppard  于2022年9月16日周五 15:38写道:
>
>> You can use inplace operators where appropriate to avoid memory
>> allocation.
>>
>>
>>
>> a *= b
>>
>> c += a
>>
>>
>>
>> Kevin
>>
>>
>>
>>
>>
>> *From: *腾刘 <27rabbi...@gmail.com>
>> *Sent: *Friday, September 16, 2022 8:35 AM
>> *To: *Discussion of Numerical Python 
>> *Subject: *[Numpy-discussion] How to avoid this memory copy overhead in
>> d=a*b+c?
>>
>>
>>
>> Hello everyone, I 'm here again to ask a naive question about Numpy
>> performance.
>>
>>
>>
>> As far as I know, Numpy's vectorization operator is very effective
>> because it utilizes SIMD instructions and multi-threads compared to
>> index-style programming (using a "for" loop and assigning each element with
>> its index in array).
>>
>>
>>
>> I 'm wondering how fast Numpy could be so I did some experiments. Take
>> this simple task as an example:
>>
>> a = np.random.rand(10 000 000)
>>
>> b = np.random.rand(10 000 000)
>>
>> c = a + b
>>
>>
>>
>> To check the performance, I wrote a simple C++ implementation of adding
>> two arrays using multi-threads too (with the compile options of: -O3
>> -mavx2). I found that the C++ implementation is slightly faster than Numpy
>> (running 100 times each to get a rather convincing statistic).
>>
>>
>>
>> *Here comes the first question, how come there is this efficiency gap?*
>>
>> I guess this is because Numpy needs to load the shared object and find
>> the wrapper of ufunc and then finally execute the underlying computation.
>> Am I right? Am I missing something here?
>>
>>
>>
>> Then I did another experiment for this statement:  d = a * b + c , where
>> a, b, c and d are all numpy arrays. I also use C++ to implement this logic
>> and found that C++ is 2 times faster than Numpy on average (also executed
>> 100 times each).
>>
>>
>>
>> I guess this is because in python we first calculate:
>>
>> temporary_var = a * b
>>
>> and then:
>>
>> d = temporary_var + c
>>
>> so we have an unnecessary memory transfer overhead. Since each array is
>> very large,  Numpy needs to write temporary_var to memory and then read it
>> back to cache.
>>
>>
>>
>> However in C++ we could just write d[i] = a[i] * b[i] + c[i] and we won't
>> create a temporary array along with the memory transfer penalty.
>>
>>
>>
>> *So another problem is if there is a method to avoid this kind of
>> overhead?* I 've learned that in Numpy we could create our own ufunc
>> with: *frompyfunc*, but it seems that there is no SIMD optimization nor
>> multi-threads utilized since this is 100 times slower than *"d = a * b +
>> c" way*.
>>
>>
>>
>>
>>
>>
>> ___
>> Nu

[Numpy-discussion] Expected behavior of np.array(..., copy=True)

2024-10-08 Thread Kevin Sheppard via NumPy-Discussion
Can anyone shed some light on the expected behavior of code using
array(..., copy=True) with pandas objects? We ran into this in statsmodels
and I think there are probably plenty of places where we explicitly call
array(..., copy=True) and think we should have a totally independent copy
of the data. One workaround is to use np.require(...,requirements="O") but
it would help to understand the expected behavior.

Here is a simple example:

import numpy as np
import pandas as pd

weeks = 2
now = pd.to_datetime('2024-01-01')
testdata = pd.DataFrame(columns=['dates', 'values'])
rg = np.random.default_rng(0)
testdata['dates'] = pd.date_range(start=now, periods=weeks * 7, freq='D')
testdata['values']=rg.integers(0, 100, size=(weeks * 7))

values = testdata['values']
print("*"*10, " Before ", "*"*10)
print(values.head())
arr = np.array(values, copy=True)
arr.sort()
print("*"*10, " After ", "*"*10)
print(values.head())
print("*"*10, " Flags ", "*"*10)
print(arr.flags)

This produces

**  Before  **
085
163
251
326
430
Name: values, dtype: int64
**  After  **
0 1
1 4
2 7
317
426
Name: values, dtype: int64
**  Flags  **
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

Thanks,
Kevin
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal to Extend NumPy Broadcasting for (D₁, D₂, ..., N, M) → (D₁, D₂, ..., K, M) When K is a Multiple of N (K % N == 0)

2025-03-25 Thread Kevin Sheppard via NumPy-Discussion
I think one aspect that would be hard to think about is how the tiling
would happen when broadcasting from K -> N where N/K is an integer. There
are at least 2 different ways to tile that would produce different results.
Suppose you have the array

[[1, 2, 3]]

which is (1,3).  If you wanted to broadcast to (1,9), you could want

[[1,1,1,2,2,2,3,3,3]] or [[1,2,3,1,2,3,1,2,3]]

This ambiguity doesn't arise with broadcasting from 1->N. Probably better
to allow users to manually tile in this case.

Kevin


On Tue, Mar 25, 2025 at 1:31 PM Shasang Thummar via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> Dear NumPy Developers,
>
>
> I hope you are doing well. I am writing to propose an enhancement to NumPy’s 
> broadcasting mechanism that could make the library even more powerful, 
> intuitive, and flexible while maintaining its memory efficiency.
>
> ## **Current Broadcasting Rule and Its Limitation**
> As per NumPy's current broadcasting rules, if two arrays have different 
> shapes, the smaller array can be expanded **only if one of its dimensions is 
> 1**. This allows memory-efficient expansion of data without copying. However, 
> if a dimension is greater than 1, NumPy **does not** allow expansion to a 
> larger size, even when the larger size is a multiple of the smaller size.
>
> ### **Example of Current Behavior (Allowed Expansion)**
> ```python
> import numpy as np
>
> A = np.array([[1, 2, 3]])  # Shape (1,3)
> B = np.array([[4, 5, 6],# Shape (2,3)
>   [7, 8, 9]])
>
> C = A + B  # Broadcasting works because (1,3) can expand to (2,3)
> print(C)
>
> *Output:*
>
> [[ 5  7  9]
>  [ 8 10 12]]
>
> Here, A has shape (1,3), which is automatically expanded to (2,3) without
> copying because *a dimension of size 1 can be stretched*.
>
> However, *NumPy fails when trying to expand a dimension where N > 1, even
> if the larger size is a multiple of N*.
> *Example of a Case That Fails (Even Though It Could Work)*
>
> A = np.array([[1, 2, 3],# Shape (2,3)
>   [4, 5, 6]])
>
> B = np.array([[10, 20, 30],  # Shape (4,3)
>   [40, 50, 60],
>   [70, 80, 90],
>   [100, 110, 120]])
>
> C = A + B  # Error! NumPy does not allow (2,3) to (4,3)
>
> *This fails with the error:*
>
> ValueError: operands could not be broadcast together with shapes (2,3) (4,3)
>
> *Why Should This Be Allowed?*
>
> If *a larger dimension is an exact multiple of a smaller one*, then
> logically, the smaller array can be *repeated* along that axis *without
> physically copying data*—just like NumPy does when broadcasting (1,M) to
> (N,M).
>
> In the above example,
>
>- A has shape (2,3), and B has shape (4,3).
>- Since 4 is *a multiple of* 2 (4 % 2 == 0), A could be *logically
>repeated 2 times* to become (4,3).
>- NumPy *already does* a similar expansion when a dimension is 1, so
>why not extend the same logic?
>
> *Proposed Behavior (Expanding N → M When M % N == 0)*
>
> Allow an axis with size N to be broadcast to size M *if and only if M is
> an exact multiple of N (M % N == 0)*. This is *just as memory-efficient*
> as the current broadcasting rules because it can be done using *stride
> tricks instead of copying data*.
> *Example of the Proposed Behavior*
>
> If NumPy allowed this new form of broadcasting:
>
> A = np.array([[1, 2, 3],# Shape (2,3)
>   [4, 5, 6]])
>
> B = np.array([[10, 20, 30],  # Shape (4,3)
>   [40, 50, 60],
>   [70, 80, 90],
>   [100, 110, 120]])
>
> # Proposed new broadcasting rule
> C = A + B
>
> print(C)
>
> *Expected Output:*
>
> [[ 11  22  33]
>  [ 44  55  66]
>  [ 71  82  93]
>  [104 115 126]]
>
> This works by *logically repeating A* to match B’s shape (4,3).
> *Why This is a Natural Extension of Broadcasting*
>
>- *Memory Efficiency:* Just like broadcasting (1,M) to (N,M), this
>expansion does *not* require physically copying data. Instead, strides
>can be adjusted to *logically repeat elements*, making this as
>efficient as current broadcasting.
>- *Intuitiveness:* Right now, broadcasting already surprises new
>users. If (1,3) can become (2,3), why not (2,3) to (4,3) when 4 is a
>multiple of 2? This would be a more intuitive rule.
>- *Extends Current Functionality:* This is *not* a new concept—it *extends
>the existing rule* where 1 can be stretched to any value. We are
>generalizing it to *any factor relationship* (N → M when M % N == 0).
>
> *Implementation Considerations*
>
> The logic behind NumPy’s current broadcasting already uses *stride tricks*
> for memory-efficient expansion. Extending it to handle (N, M) → (K, M)
> (where K % N == 0) would require:
>
>- Updating np.broadcast_shapes(), np.broadcast_to(), and related
>functions.
>- Extending the existing logic that handles expanding 1 to support
>factors as well.
>- Ensuring backward compatibility and maintaining performance.
>
> *Conclusion*
>
> I s