[Numpy-discussion] Expected behavior of np.array(..., copy=True)

2024-10-08 Thread Kevin Sheppard via NumPy-Discussion
Can anyone shed some light on the expected behavior of code using
array(..., copy=True) with pandas objects? We ran into this in statsmodels
and I think there are probably plenty of places where we explicitly call
array(..., copy=True) and think we should have a totally independent copy
of the data. One workaround is to use np.require(...,requirements="O") but
it would help to understand the expected behavior.

Here is a simple example:

import numpy as np
import pandas as pd

weeks = 2
now = pd.to_datetime('2024-01-01')
testdata = pd.DataFrame(columns=['dates', 'values'])
rg = np.random.default_rng(0)
testdata['dates'] = pd.date_range(start=now, periods=weeks * 7, freq='D')
testdata['values']=rg.integers(0, 100, size=(weeks * 7))

values = testdata['values']
print("*"*10, " Before ", "*"*10)
print(values.head())
arr = np.array(values, copy=True)
arr.sort()
print("*"*10, " After ", "*"*10)
print(values.head())
print("*"*10, " Flags ", "*"*10)
print(arr.flags)

This produces

**  Before  **
085
163
251
326
430
Name: values, dtype: int64
**  After  **
0 1
1 4
2 7
317
426
Name: values, dtype: int64
**  Flags  **
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

Thanks,
Kevin
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal to Extend NumPy Broadcasting for (D₁, D₂, ..., N, M) → (D₁, D₂, ..., K, M) When K is a Multiple of N (K % N == 0)

2025-03-25 Thread Kevin Sheppard via NumPy-Discussion
I think one aspect that would be hard to think about is how the tiling
would happen when broadcasting from K -> N where N/K is an integer. There
are at least 2 different ways to tile that would produce different results.
Suppose you have the array

[[1, 2, 3]]

which is (1,3).  If you wanted to broadcast to (1,9), you could want

[[1,1,1,2,2,2,3,3,3]] or [[1,2,3,1,2,3,1,2,3]]

This ambiguity doesn't arise with broadcasting from 1->N. Probably better
to allow users to manually tile in this case.

Kevin


On Tue, Mar 25, 2025 at 1:31 PM Shasang Thummar via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> Dear NumPy Developers,
>
>
> I hope you are doing well. I am writing to propose an enhancement to NumPy’s 
> broadcasting mechanism that could make the library even more powerful, 
> intuitive, and flexible while maintaining its memory efficiency.
>
> ## **Current Broadcasting Rule and Its Limitation**
> As per NumPy's current broadcasting rules, if two arrays have different 
> shapes, the smaller array can be expanded **only if one of its dimensions is 
> 1**. This allows memory-efficient expansion of data without copying. However, 
> if a dimension is greater than 1, NumPy **does not** allow expansion to a 
> larger size, even when the larger size is a multiple of the smaller size.
>
> ### **Example of Current Behavior (Allowed Expansion)**
> ```python
> import numpy as np
>
> A = np.array([[1, 2, 3]])  # Shape (1,3)
> B = np.array([[4, 5, 6],# Shape (2,3)
>   [7, 8, 9]])
>
> C = A + B  # Broadcasting works because (1,3) can expand to (2,3)
> print(C)
>
> *Output:*
>
> [[ 5  7  9]
>  [ 8 10 12]]
>
> Here, A has shape (1,3), which is automatically expanded to (2,3) without
> copying because *a dimension of size 1 can be stretched*.
>
> However, *NumPy fails when trying to expand a dimension where N > 1, even
> if the larger size is a multiple of N*.
> *Example of a Case That Fails (Even Though It Could Work)*
>
> A = np.array([[1, 2, 3],# Shape (2,3)
>   [4, 5, 6]])
>
> B = np.array([[10, 20, 30],  # Shape (4,3)
>   [40, 50, 60],
>   [70, 80, 90],
>   [100, 110, 120]])
>
> C = A + B  # Error! NumPy does not allow (2,3) to (4,3)
>
> *This fails with the error:*
>
> ValueError: operands could not be broadcast together with shapes (2,3) (4,3)
>
> *Why Should This Be Allowed?*
>
> If *a larger dimension is an exact multiple of a smaller one*, then
> logically, the smaller array can be *repeated* along that axis *without
> physically copying data*—just like NumPy does when broadcasting (1,M) to
> (N,M).
>
> In the above example,
>
>- A has shape (2,3), and B has shape (4,3).
>- Since 4 is *a multiple of* 2 (4 % 2 == 0), A could be *logically
>repeated 2 times* to become (4,3).
>- NumPy *already does* a similar expansion when a dimension is 1, so
>why not extend the same logic?
>
> *Proposed Behavior (Expanding N → M When M % N == 0)*
>
> Allow an axis with size N to be broadcast to size M *if and only if M is
> an exact multiple of N (M % N == 0)*. This is *just as memory-efficient*
> as the current broadcasting rules because it can be done using *stride
> tricks instead of copying data*.
> *Example of the Proposed Behavior*
>
> If NumPy allowed this new form of broadcasting:
>
> A = np.array([[1, 2, 3],# Shape (2,3)
>   [4, 5, 6]])
>
> B = np.array([[10, 20, 30],  # Shape (4,3)
>   [40, 50, 60],
>   [70, 80, 90],
>   [100, 110, 120]])
>
> # Proposed new broadcasting rule
> C = A + B
>
> print(C)
>
> *Expected Output:*
>
> [[ 11  22  33]
>  [ 44  55  66]
>  [ 71  82  93]
>  [104 115 126]]
>
> This works by *logically repeating A* to match B’s shape (4,3).
> *Why This is a Natural Extension of Broadcasting*
>
>- *Memory Efficiency:* Just like broadcasting (1,M) to (N,M), this
>expansion does *not* require physically copying data. Instead, strides
>can be adjusted to *logically repeat elements*, making this as
>efficient as current broadcasting.
>- *Intuitiveness:* Right now, broadcasting already surprises new
>users. If (1,3) can become (2,3), why not (2,3) to (4,3) when 4 is a
>multiple of 2? This would be a more intuitive rule.
>- *Extends Current Functionality:* This is *not* a new concept—it *extends
>the existing rule* where 1 can be stretched to any value. We are
>generalizing it to *any factor relationship* (N → M when M % N == 0).
>
> *Implementation Considerations*
>
> The logic behind NumPy’s current broadcasting already uses *stride tricks*
> for memory-efficient expansion. Extending it to handle (N, M) → (K, M)
> (where K % N == 0) would require:
>
>- Updating np.broadcast_shapes(), np.broadcast_to(), and related
>functions.
>- Extending the existing logic that handles expanding 1 to support
>factors as well.
>- Ensuring backward compatibility and maintaining performance.
>
> *Conclusion*
>
> I s