[Numpy-discussion] Re: Endorsing SPECs 1, 6, 7, and 8
On Tue, Oct 8, 2024 at 8:36 AM Nathan via NumPy-Discussion < numpy-discussion@python.org> wrote: > > Since the legacy RNG interface cannot be deprecated and we encourage > downstream to use it in tests according to the text of NEP 19, I'm not sure > about the text in SPEC 7 that talks about deprecating using legacy RNGs. Or > are you saying that we have now reached the point where we can update NEP > 19 to encourage moving away from the legacy interface? > We have already always encouraged people to move away from the legacy interface in their APIs. SPEC 7 recommends a principled way for downstream projects to implement that move. NEP 19 acknowledged that sometimes one might still have a use case for creating a legacy RandomState object and calling it in their tests to generate test data (but not otherwise pass that RandomState object to the code under test), but that's not what SPEC 7 addresses. NEP 19 doesn't really actively recommend the use of RandomState for this purpose, just acknowledges that it's a valid use case that numpy will continue to support even while we push for the exclusive use of Generator inside of library/program code. NEP 19 doesn't need an update for us to endorse SPEC 7 (whether it needs one, separately, to clarify its intent is another question). -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Canonical way of serialising Generators
On Mon, Oct 28, 2024 at 12:11 AM Andrew Nelson via NumPy-Discussion < numpy-discussion@python.org> wrote: > Hi all, > is there a canonical way of serialising Generators (not via pickle). > Pickle is the canonical way. In particular, the pickle _machinery_ is going to be the single source of truth for serialization. But I'll recapitulate the important details here. `Generator` itself is a stateless wrapper around the `BitGenerator`, so you can ignore it and focus on the `BitGenerator` and its `SeedSequence` (at that, really, only in the case you need to continue spawning). `BitGenerator.__getstate__()` will give you `(bg_state_dict, seed_seq)` (a `dict` and the `SeedSequence`). The `BitGenerator` state is going to be a simple dict, but there will be arbitrary-sized ints and/or numpy arrays in there. Third-party `BitGenerator`s can do whatever they like, so pickle's about your only fallback there. For numpy `BitGenerator`s, the name of the class will be in the key `'bit_generator'`, but for third-party ones, there's no place for you too look by name. Notionally, `BitGenerator`s can accept any `ISeedSequence` implementation, to allow for other kinds of seed massaging algorithms, but we haven't seen much (any) interest in that, so you can probably safely consider `SeedSequence` proper. Because Cython does default pickle stuff under the covers which was sufficient, we don't have an explicit `__getstate__()` for you to peruse, but you can look at our constructor, which indicates what can be passed; each one ends up as an attribute with the same name. You'll want all of them for serialization purposes. Note that `entropy` can be an arbitrary-sized int or a sequence of arbitrary-sized ints. If a sequence, it could be a list or a tuple, but this does not need to be preserved; any sequence type will do on deserialization. `spawn_key` should always be a tuple of bounded-size ints (each should fit into a uint32, but will be a plain Python int). `pool_size` is important configuration, though rarely modified (and should be either 4 or 8, but notionally folks could choose otherwise). `n_children_spawned` is important state if there's been spawning and you want to continue spawning later correctly/reproducibly. But that's it. So your ultimate serialization needs to handle arbitrary-sized integers, lists/tuples of integers, and numpy arrays (of various integer dtypes). JSON with a careful encoder/decoder that can handle those cases would be fine. Would the following be reasonable for saving and restoring state: > No, you're missing all of the actual state. The `entropy` of the `SeedSequence` is the original user-input seed, not the current state of the `BitGenerator`. > Specifically I'm interested in a safe way (i.e. no pickle) of saving/restoring Generator state via HDF5 file storage. By restricting your domain to only numpy-provided `BitGenerator`s and true `SeedSequence`s, you can do this. In full generality, one cannot. Untested: ``` def rng_dict(rng): bg_state = rng.bit_generator.state ss = rng.bit_generator.seed_seq ss_dict = dict(entropy=ss.entropy, spawn_key=ss.spawn_key, pool_size=ss.pool_size, n_children_spawned=ss.n_children_spawned) return dict(bg_state=bg_state, seed_seq=ss_dict) def rng_fromdict(d): bg_state = d['bg_state'] ss = np.random.SeedSequence(**d['seed_seq']) bg = getattr(np.random, bg_state['bit_generator'])(ss) bg.state = bg_state rng = np.random.Generator(bg) return rng ``` -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Suggestion to show the shape in repr for summarized arrays
On Tue, Oct 1, 2024 at 9:57 AM Evgeni Burovski via NumPy-Discussion < numpy-discussion@python.org> wrote: > I like this, too. > > And I think the trailing comment, # shape=... is much better, as it gets > the best of both worlds: user get the info, and tools which do > eval(repr(..)) only need to learn to ignore the comment. For one, numpy's > own doctests will keep working with no churn since scipy_doctest handles > this already. > Given that we've already chosen to use the fake `shape=...` keyword when there is a 0-length axis, what do you think we should do, consistency-wise? >>> np.empty([10, 0]) array([], shape=(10, 0), dtype=float64) 1. Follow the precedent and use the fake `shape=...` keyword in the summarized-array case. 2. Ignore the precedent and use a following `# shape=...` comment afterwards in the summarized-array case and leave the 0-length-axis case alone. 3. Fix the 0-length-axis case to use the following `# shape=...` comment too. -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: [EXTERNAL] Copy on __setitem__
On Thu, Dec 26, 2024 at 5:48 PM Benjamin Root via NumPy-Discussion < numpy-discussion@python.org> wrote: > Seems to make sense to me? Or is the following a bug? > > >>> import numpy as np > >>> u = np.zeros(5) > >>> v = np.ones(5) > >>> u > array([0., 0., 0., 0., 0.]) > >>> u[...] = v > >>> u > array([1., 1., 1., 1., 1.]) > >>> v[4] = 5 > >>> v > array([1., 1., 1., 1., 5.]) > >>> u > array([1., 1., 1., 1., 1.]) > > If you don't do a copy, then it is a view, right? And so, should modifying > v[4] change u[4]? Relatedly, if these were object arrays, of mutable > objects, should mutating u[4] point to the exact same instance as v[4] and > mutating it in one array means that the other array "sees" the same changes? > No one is suggesting that behavior should change. The issue is whether or not an unnecessary, _additional_ copy gets made when converting a not-exactly-an-`ndarray` object to an `ndarray` object before doing the assignment (which always copies values over to the destination array). -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: surprising behavior from array indexing
On Mon, Dec 30, 2024 at 10:28 AM Mark Harfouche via NumPy-Discussion < numpy-discussion@python.org> wrote: > Happy new year everybody! > > I've been upgrading my code to start to support array indexing and in my > tests I found something that was well documented, but surprising to me. > > I've tried to read through > https://numpy.org/doc/stable/user/basics.indexing.html#combining-advanced-and-basic-indexing > and even after multiple passes, I still find it very terse... > > Consider a mutli dimensional dataset: > > import numpy as np > shape = (10, 20, 30) > original = np.arange(np.prod(shape)).reshape(shape) > > Let's consider we want to collapse dim 0 to a single entry > Let's consider we want a subset from dim 1, with a slice > Let's consider that we want want 3 elements from dim 2 > > i = 2 > j = slice(1, 6) > k = slice(7, 10) > out_basic = original[i, j, k] > assert out_basic.shape == (5, 3) > > Now consider we want to provide freedom to have instead of a slice for k, > an arbitrary "array" > > k = [7, 11, 13] > out_array = original[i, j, k] > assert out_array.shape == (5, 3), f"shape is actually {out_array.shape}" > > AssertionError: shape is actually (3, 5) > > This is somewhat very surprising to me. I totally understand that things > won't change in terms of this kind of indexing in numpy, but is there a way > I can adjust my indexing strategy to regain the ability to slice into my > array in a "single shot". > > The main usecase is for arrays that are truly huge, but chucked in ways > where slicing into them can be quite efficient. This multi-dimensional > imaging data. Each chunk is quite "huge" so this kind of metadata > manipulation is worthwhile to avoid unecessary IO. > > Perhaps there is a "simple" distinction I am missing, for example using a > tuple for k instead of a list > No, there's no simple solution that you're missing. The kind of indexing that you want has been considered in NEP 21 (which called it "orthogonal indexing"), which saw some progress, but has largely been left fallow. There's been some movement on the Array API specification side, so that might spur some movement. https://numpy.org/neps/nep-0021-advanced-indexing.html Jaime Rio made a pure Python implementation that might work for you (though I'm not sure about the performance for large arrays with big slices), but it's buried in a close PR (still works, though): https://github.com/numpy/numpy/pull/5749/files >>> original_oindex = OrthogonalIndexer(original) >>> original_oindex[i, j, k] array([[1237, 1241, 1243], [1267, 1271, 1273], [1297, 1301, 1303], [1327, 1331, 1333], [1357, 1361, 1363]]) Note that some other array implementations like Xarray and Zarr provide the `.oindex` property which does the orthogonal indexing semantics (roughly) per NEP 21, so those might be options for you. -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: odd behaviour of meshgrid with 3 arrays
On Thu, Nov 21, 2024 at 2:00 PM Slavin, Jonathan via NumPy-Discussion < numpy-discussion@python.org> wrote: > Hi all, > > I was trying to use meshgrid with three arrays and got some odd results. > Here's a simple example: > xt = np.array([1,2,3,4]) > yt = np.array([6,7,8]) > zt = np.array([12,13]) > xxx,yyy,zzz = np.meshgrid(xt,yt,zt) > So I would expect that xxx[0,0,:] = array([1,2,3,4]) > instead I get xxx[0,0,:] = array([1,1]) and xxx[0,:,0] = array([1,2,3,4]) > also yyy[:,0,0] = array([6,7,8]), whereas I would expect yyy[0,:,0] = > array([6,7,8]) > So what's going on? This seems like a bug to me. > Any suggestions for getting what I wanted -- i.e. xxx.shape = (2,3,4), > with values as appropriate? > This is documented in the `Notes` section concerning the behavior with the default `indexing='xy'`, which is primarily for 2D arrays (and presumably following a convention from another language like MATLAB). In order to get the (2, 3, 4) arrays that you want, use `indexing='ij'` # Note the flipped order of inputs and outputs since you want the `zt` values across the first axis. zzz, yyy, xxx = np.meshgrid(zt, yt, xt, indexing='ij') -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Proposal to Extend NumPy Broadcasting for (D₁, D₂, ..., N, M) → (D₁, D₂, ..., K, M) When K is a Multiple of N (K % N == 0)
On Tue, Mar 25, 2025 at 9:34 AM Shasang Thummar via NumPy-Discussion < numpy-discussion@python.org> wrote: > *Why Should This Be Allowed?* > > If *a larger dimension is an exact multiple of a smaller one*, then > logically, the smaller array can be *repeated* along that axis *without > physically copying data*—just like NumPy does when broadcasting (1,M) to > (N,M). > > In the above example, > >- A has shape (2,3), and B has shape (4,3). >- Since 4 is *a multiple of* 2 (4 % 2 == 0), A could be *logically >repeated 2 times* to become (4,3). >- NumPy *already does* a similar expansion when a dimension is 1, so >why not extend the same logic? > > It's not a particularly similar expansion. The reason broadcasting a 1-dimension works is that we just set the stride for the broadcasted dimension to be 0, so the pointer never advances along that dimension and just keeps pointing at the one value along that axis. You can't do the same thing with tiling n>1. It would have to advance within the tile and then go back. So, unlike current broadcasting, I don't think we can just adjust the strides on each operand separately to match the broadcasted shape, which is something the current machinery relies upon. You could reshape `B` to be (2, 2, 3), broadcast as normal, operate, then reshape the result back. I'm not sure if it's guaranteed in all cases that the reshaping back could be done without copying, and I have no clear idea of how this works in the ternary+ operator case. But even so, this is not a straightforward extension of the broadcasting functionality and would not just plug in to the rest of the broadcasting machinery. I could be wrong; haven't put too much effort into it, but I think the first step would be to walk through what the stride tricks actually would be to get this operation to work. Then we can talk about whether or not it's desirable to plug it in. -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Add Sampling with Dynamic weights with replacement to np.random.choice
On Fri, May 2, 2025 at 9:55 AM Shourya Jain via NumPy-Discussion < numpy-discussion@python.org> wrote: > Feature request : Add replacement based sampling method with weight decay > for choosing samples from a list/array > > I would like to add it with an algorithm as follows : > for _ in range(target_size): > idx = np.random.choice(a, 1, p=p) # Get a sample > samples.append(a[idx]) # Append sample to samples list > p[idx] *= decay_factor # Update probabilities > p /= np.sum(p) # Normalize probabilities > > This provides a more representative sample than normal sampling with > replacement as elements that have not been sampled get higher probability > of being sampled later. > I would like to add it to the np.random.choice itself, by adding a > parameter decay_factor that defaults to 1 > This seems like an ad hoc algorithm rather than a well-known, well-studied random process. I'm happy to be corrected with references to the literature, but even if there are references, I'm happy to leave this for users to implement on their own. -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Format of arrays to facilitate analysis
On Tue, Feb 18, 2025 at 11:06 AM Dan Patterson wrote: > I tend to work with Nx2 arrays representing coordinate geometry. > I have examined a number of packages and there is no guidelines as to why > a certain arrangement is preferred over the other. > For example: a rectangle, coordinates ordered clockwise with the first > and last the same to ensure closure of the geometry > That's the great thing about conventions; there are so many to choose from! /sarcasm Usually, there is nothing substantial that would cause one to strongly prefer one convention over another. I usually take a look at existing libraries that I might want to use and pick a convention that makes it easy to use those libraries. So for use cases like this, I might want to use Shapely for some robust geometrical operations, so I'd follow Shapely conventions. But if I'm just drawing polygons onto images, I might follow scikit-image's convention instead. By and large, you can usually safely follow whatever is your personal preference without fearing that you are missing out on something vital. What makes these conventions are the low stakes. -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Format of arrays to facilitate analysis
On Tue, Feb 18, 2025 at 3:00 PM Dan Patterson wrote: > So from a purely numpy perspective, there is no advantage if one of > aforementioned coordinate arrangements is used (eg Nx2, ravelled, 2xN). No, not from numpy's side. Some things you want to do will be easier/prettier in one of the arrangements than others (e.g. the ravelled will likely be the least convenient), but it has more to do with other code and file formats that you are interacting with that will be the largest factors in breaking the symmetry. -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Functions for secant, cosecant, and cotangent
On Wed, Feb 12, 2025 at 5:12 PM Jayanth Tumuluri via NumPy-Discussion < numpy-discussion@python.org> wrote: > Hello, > > I noticed that current/past installations of Numpy do not have functions > for secant, cosecant, and cotangent. Sometimes, there's a reason to use > these over their less elegant forms of 1/cos, 1/sin, and 1/tan. Could this > be something to be included in future installations of Numpy? I'd really > appreciate it! > As for numpy itself, we've generally held the line for special/trig functions that we'll add those that are in the C99 standard, and leave the rest for scipy.special. These days, we defer to the Array API (if they add a new elementwise function, we're likely going to add it), but they have a roughly similar standard, though some functions that are really useful in neural networks have also made the leap. As for scipy.special, one of the considerations is if there are implementations of those functions that do anything other than just taking the reciprocal of the result of the main trig function. Taking a look at our special function libraries, that we might tap for this, I'm not seeing any that do implement these functions (the exception being the cotangent in degrees, which Cephes implements, and we already expose in scipy.special). -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Feature Request: Left (and Right) inverse of the Cross Product
On Mon, Jun 2, 2025 at 1:09 PM Oscar Benjamin via NumPy-Discussion < numpy-discussion@python.org> wrote: > In vectors the minimum norm solution for b in a x b = c is just bhat = > (c x a) / |a|^2 so the cross product computes its own "inverse": > > In [135]: np.cross(c, a) / (a**2).sum() > Out[135]: array([-0.93244529, 0.74775365, 1.09117371]) > Delightful! -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Feature Request: Left (and Right) inverse of the Cross Product
On Mon, Jun 2, 2025 at 9:30 AM Leon Deligny via NumPy-Discussion < numpy-discussion@python.org> wrote: > ### Proposed new feature or change: > > Motivations: This is specific to 3D vector algebra. In Fluid Dynamics, we > have access to the moment of a force at a specific point (M_P = OP \cross > F). This calculation is crucial when determining the center of pressure > (CoP), a pivotal concept for understanding the distribution of forces on an > object submerged in a fluid. To accurately pinpoint the CoP, we often need > to "reverse" this process, effectively requiring an inverse functionality > for the cross product. > > How to compute the right-inverse: Given two real numbers a and c we want > to find b such that c = a \cross b . If we write this equation in matrix > format then we need to define: > > ```latex > A = \begin{pmatrix} > 0 & -a_3 & a_2 \\ > a_3 & 0 & -a_1 \\ > -a_2 & a_1 & 0 > \end{pmatrix} > ``` > > to get $c = A \cdot b$ (where $\cdot$ is the matrix multiplication). In > real case scenarios there does not exist a vector b such that $c = A \cdot > b$. So we can always find a vector b such that $|c - A \cdot b|_2^2$ is > minimal (i.e. b is the best approximation $c \approx A \cdot b$). > > When minimizing $|c - A \cdot b|_2^2$ there exists an analytical solution > found with gaussian elimination procedure (we are working with 3x3 matrices > and 3-dimensional vectors). > > I did not find any litterature on this subject specifically. But this > would be a greatly appreciated feature. > C.f. https://github.com/numpy/numpy/issues/29110 To be clear, I still don't think we're entertaining a feature request for inclusion in numpy, but we on the mailing list can perhaps help you implement this in your own code. Note that because `A` is singular, there are an infinite number of `b` values that will meet the criterion. The problem is not reducing the residual (you can always get to 0, or near enough in floating point terms), the problem is picking which of the infinite valid solutions you want. `np.linalg.lstsq()` will give you the minimum-norm solution, which is usually a pretty good option, though it may or may not suit your specific application. >>> import numpy as np >>> rng = np.random.default_rng(0x7cadf78fc1ad960cdb91bb4745b84b28) >>> av = rng.multivariate_normal(np.zeros(3), np.eye(3), size=10) >>> bv = rng.multivariate_normal(np.zeros(3), np.eye(3), size=10) >>> cv = np.cross(av, bv) >>> cv array([[ 0.78079691, -0.61265949, 0.43715586], [ 0.0508124 , 2.3512873 , 0.17093971], [ 0.39231732, 0.35281331, 0.90361588], [-0.36610523, 0.24177299, 0.49421533], [-0.51625536, 1.69312288, -0.9220552 ], [-0.59298882, 0.60810492, 2.38083992], [ 0.17308949, -0.59307124, -0.7205352 ], [-0.17391941, -0.43438387, -0.451717 ], [ 1.15385801, -0.80682863, -1.4479563 ], [-1.77839153, -2.30249409, 0.05814433]]) # Fancy way to compute the A matrix corresponding to the operation `np.cross(a, ...) >>> A = np.cross(av[0], -np.eye(3)) >>> A @ bv[0] array([ 0.78079691, -0.61265949, 0.43715586]) >>> cv[0] array([ 0.78079691, -0.61265949, 0.43715586]) >>> for a, b in zip(av, bv): ... c = np.cross(a, b) ... A = np.cross(a, -np.eye(3)) ... bhat = np.linalg.lstsq(A, c)[0] ... res = c - np.cross(a, bhat) ... print(res) ... [-2.22044605e-16 0.e+00 -1.66533454e-16] [7.56339436e-16 8.88178420e-16 2.77555756e-16] [-5.55111512e-17 -2.77555756e-16 -2.22044605e-16] [-1.66533454e-16 -5.55111512e-17 0.e+00] [0.e+00 6.66133815e-16 3.33066907e-16] [-1.11022302e-16 0.e+00 -4.44089210e-16] [ 0.e+00 -1.11022302e-16 -2.22044605e-16] [-1.11022302e-16 1.11022302e-16 -5.55111512e-17] [ 0.e+00 0.e+00 -2.22044605e-16] [2.22044605e-16 4.44089210e-16 0.e+00] -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Can't launch Jupyter
On Sat, Jun 28, 2025 at 8:42 AM Kwaku Oppong via NumPy-Discussion < numpy-discussion@python.org> wrote: > can someone helps me. Whenever i ytried to launch Jupyter from Navigator, > this is the message I gets > Sorry, this isn't the place for Jupyter questions. -- Robert Kern ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: arch...@mail-archive.com