[Numpy-discussion] Re: next NumPy Optimization Team meeting

2023-11-20 Thread Inessa Pawson
Hi, everyone!
To adjust for the end of Daylight Saving Time, NumPy Optimization Team
meetings were moved to 17:00 (5pm) UTC. If you're planning on attending
today's meeting, please make sure to refresh your calendar.

On Fri, Nov 17, 2023 at 11:23 AM Inessa Pawson  wrote:

> The next NumPy Optimization Team meeting will be held on Monday, November
> 20th at 4pm UTC.
> Join us via Zoom:
> https://numfocus-org.zoom.us/j/88162122074?pwd=a3h0cTZubzlLcTZzMVhCL1AxUVFMdz09
> .
> Everyone is welcome and encouraged to attend.
> To add to the meeting agenda the topics you’d like to discuss, follow the
> link: https://hackmd.io/dVdSlQ0TThWkOk0OkmGsmw?both.
> For the notes from the previous meetings, visit:
> https://github.com/numpy/archive/tree/main/optim_team_meetings.
>


-- 
Cheers,
Inessa

Inessa Pawson
GitHub: inessapawson
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Making np.object_ generic for better object array type hinting

2023-11-20 Thread christopher . ariza
Greetings!

Could np.object_ be made generic? For example, if I have an object array of 
two-element integer tuples, it would be ideal if I could type hint the array 
with np.dtype[np.object_[tuple[int, int]]]; this would allow us to distinguish 
between object arrays by their contained types. (I.e., 
np.dtype[np.object_[tuple[int, int]]] is not the same as 
np.dtype[np.object_[datetime.date]].)

Context: I am the lead developer of StaticFrame 
(https://github.com/static-frame/static-frame), an alternative DataFrame 
library built on an immutable data model. StaticFrame has recently made 
DataFrames (and other containers) generic 
(https://towardsdatascience.com/type-hinting-dataframes-for-static-analysis-and-runtime-validation-3dedd2df481d).
 When specifying columnar types, we use NumPy generic types. While this 
provides a great deal of flexibility for most types, when an object array is 
involved, we cannot express anything about what is contained in the object 
array. Making np.object_ generic would solve this issue.

I am happy to create an issue and explore a PR if this seems like a good 
enhancement.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: *New Time* Next Documentation team meeting

2023-11-20 Thread Hetav
Hi Makulika,

I would love to join today's meeting. Can I get the passcode for it?

Regards,
Hetav Pandya


On Sun, Nov 19, 2023 at 9:07 PM Mukulika Pahari 
wrote:

> Hi all,
>
> Our next Documentation Team meeting will happen on *Monday, November 20*
> at *11PM UTC*. If this time slot is inconvenient for you to join, please
> let me know in the replies or Slack and we will try to add another time
> slot.
>
> All are welcome - you don't need to already be a contributor to join. If
> you have questions or are curious about what we're doing, we'll be happy to
> meet you!
>
> If you wish to join on Zoom, use this (updated) link:
>
> https://numfocus-org.zoom.us/j/85016474448?pwd=TWEvaWJ1SklyVEpwNXUrcHV1YmFJQ.
> ..
>
> Here's the permanent hackmd document with the meeting notes (still being
> updated):
> https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg
>
> Hope to see you around!
>
> Best wishes,
> Mukulika
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: pandyahet...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] pickling dtype values

2023-11-20 Thread Sebastien Binet
hi there,

I have written a Go package[1] that can read/write simple arrays in the numpy 
file format [2].
when I wrote it, it was for simple interoperability use cases, but now people 
would like to be able to read back ragged-arrays[3].

unless I am mistaken, this means I need to interpret pieces of pickled data 
(`ndarray`, `multiarray` and `dtype`).

so I am trying to understand how to unpickle `dtype` values that have been 
pickled:

```python
import numpy as np
import pickle
import pickletools as pt

pt.dis(pickle.dumps(np.dtype("int32"), protocol=4), annotate=True)
```

gives:
```
0: \x80 PROTO  4 Protocol version indicator.
2: \x95 FRAME  55 Indicate the beginning of a new frame.
   11: \x8c SHORT_BINUNICODE 'numpy' Push a Python Unicode string object.
   18: \x94 MEMOIZE(as 0)Store the stack top into the memo.  The 
stack is not popped.
   19: \x8c SHORT_BINUNICODE 'dtype' Push a Python Unicode string object.
   26: \x94 MEMOIZE(as 1)Store the stack top into the memo.  The 
stack is not popped.
   27: \x93 STACK_GLOBAL Push a global object (module.attr) on the 
stack.
   28: \x94 MEMOIZE(as 2)Store the stack top into the memo.  The 
stack is not popped.
   29: \x8c SHORT_BINUNICODE 'i4'Push a Python Unicode string object.
   33: \x94 MEMOIZE(as 3)Store the stack top into the memo.  The 
stack is not popped.
   34: \x89 NEWFALSE Push False onto the stack.
   35: \x88 NEWTRUE  Push True onto the stack.
   36: \x87 TUPLE3   Build a three-tuple out of the top three 
items on the stack.
   37: \x94 MEMOIZE(as 4)Store the stack top into the memo.  The 
stack is not popped.
   38: RREDUCE   Push an object built from a callable and 
an argument tuple.
   39: \x94 MEMOIZE(as 5)Store the stack top into the memo.  The 
stack is not popped.
   40: (MARK Push markobject onto the stack.
   41: KBININT13 Push a one-byte unsigned integer.
   43: \x8c SHORT_BINUNICODE '<' Push a Python Unicode string object.
   46: \x94 MEMOIZE(as 6)Store the stack top into the memo.  The 
stack is not popped.
   47: NNONE Push None on the stack.
   48: NNONE Push None on the stack.
   49: NNONE Push None on the stack.
   50: JBININT -1Push a four-byte signed integer.
   55: JBININT -1Push a four-byte signed integer.
   60: KBININT10 Push a one-byte unsigned integer.
   62: tTUPLE  (MARK at 40) Build a tuple out of the topmost stack 
slice, after markobject.
   63: \x94 MEMOIZE(as 7)   Store the stack top into the memo.  The 
stack is not popped.
   64: bBUILD   Finish building an object, via 
__setstate__ or dict update.
   65: .STOPStop the unpickling machine.
highest protocol among opcodes = 4
```

I have tried to find the usual `__reduce__` and `__setstate__` methods to 
understand what are the various arguments, to no avail.

so, in :
```python
>>> np.dtype("int32").__reduce__()[1]
('i4', False, True)
>>> np.dtype("int32").__reduce__()[2]
(3, '<', None, None, None, -1, -1, 0)
```
what are the meaning of the various arguments ?

thanks in advance,
sebastien.

[1] https://github.com/sbinet/npyio
[2] https://numpy.org/neps/nep-0001-npy-format.html
[3] https://github.com/sbinet/npyio/issues/20
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: pickling dtype values

2023-11-20 Thread Robert Kern
On Mon, Nov 20, 2023 at 10:08 PM Sebastien Binet  wrote:

> hi there,
>
> I have written a Go package[1] that can read/write simple arrays in the
> numpy file format [2].
> when I wrote it, it was for simple interoperability use cases, but now
> people would like to be able to read back ragged-arrays[3].
>
> unless I am mistaken, this means I need to interpret pieces of pickled
> data (`ndarray`, `multiarray` and `dtype`).
>
> so I am trying to understand how to unpickle `dtype` values that have been
> pickled:
>
> ```python
> import numpy as np
> import pickle
> import pickletools as pt
>
> pt.dis(pickle.dumps(np.dtype("int32"), protocol=4), annotate=True)
> ```
>
> gives:
> ```
> 0: \x80 PROTO  4 Protocol version indicator.
> 2: \x95 FRAME  55 Indicate the beginning of a new frame.
>11: \x8c SHORT_BINUNICODE 'numpy' Push a Python Unicode string object.
>18: \x94 MEMOIZE(as 0)Store the stack top into the memo.
> The stack is not popped.
>19: \x8c SHORT_BINUNICODE 'dtype' Push a Python Unicode string object.
>26: \x94 MEMOIZE(as 1)Store the stack top into the memo.
> The stack is not popped.
>27: \x93 STACK_GLOBAL Push a global object (module.attr) on
> the stack.
>28: \x94 MEMOIZE(as 2)Store the stack top into the memo.
> The stack is not popped.
>29: \x8c SHORT_BINUNICODE 'i4'Push a Python Unicode string object.
>33: \x94 MEMOIZE(as 3)Store the stack top into the memo.
> The stack is not popped.
>34: \x89 NEWFALSE Push False onto the stack.
>35: \x88 NEWTRUE  Push True onto the stack.
>36: \x87 TUPLE3   Build a three-tuple out of the top
> three items on the stack.
>37: \x94 MEMOIZE(as 4)Store the stack top into the memo.
> The stack is not popped.
>38: RREDUCE   Push an object built from a callable
> and an argument tuple.
>39: \x94 MEMOIZE(as 5)Store the stack top into the memo.
> The stack is not popped.
>40: (MARK Push markobject onto the stack.
>41: KBININT13 Push a one-byte unsigned integer.
>43: \x8c SHORT_BINUNICODE '<' Push a Python Unicode string object.
>46: \x94 MEMOIZE(as 6)Store the stack top into the memo.
> The stack is not popped.
>47: NNONE Push None on the stack.
>48: NNONE Push None on the stack.
>49: NNONE Push None on the stack.
>50: JBININT -1Push a four-byte signed integer.
>55: JBININT -1Push a four-byte signed integer.
>60: KBININT10 Push a one-byte unsigned integer.
>62: tTUPLE  (MARK at 40) Build a tuple out of the topmost
> stack slice, after markobject.
>63: \x94 MEMOIZE(as 7)   Store the stack top into the
> memo.  The stack is not popped.
>64: bBUILD   Finish building an object, via
> __setstate__ or dict update.
>65: .STOPStop the unpickling machine.
> highest protocol among opcodes = 4
> ```
>
> I have tried to find the usual `__reduce__` and `__setstate__` methods to
> understand what are the various arguments, to no avail.
>

First, be sure to read the generic `object.__reduce__` docs:

https://docs.python.org/3.11/library/pickle.html#object.__reduce__

Here is the C source for `np.dtype.__reduce__()`:

https://github.com/numpy/numpy/blob/main/numpy/_core/src/multiarray/descriptor.c#L2623-L2750

And `np.dtype.__setstate__()`:

https://github.com/numpy/numpy/blob/main/numpy/_core/src/multiarray/descriptor.c#L2787-L3151

so, in :
> ```python
> >>> np.dtype("int32").__reduce__()[1]
> ('i4', False, True)
>

These are arguments to the `np.dtype` constructor and are documented in
`np.dtype.__doc__`. The `False, True` arguments are hardcoded and always
those values.


> >>> np.dtype("int32").__reduce__()[2]
> (3, '<', None, None, None, -1, -1, 0)
>

These are arguments to pass to `np.dtype.__setstate__()` after the object
has been created.

0. `3` is the version number of the state; `3` is typical for simple
dtypes; datetimes and others with metadata will bump this to `4` and use a
9-element tuple instead of this 8-element tuple.
1. `'<'` is the endianness flag.
2. If there are subarrays
 (e.g.
`np.dtype((np.int32, (2,2)))`), that info here.
3. If there are fields, a tuple of the names of the fields
4. If there are fields, the field descriptor dict.
5. If extended dtype (e.g. fields, strings, void, etc.), the element size,
else `-1`.
6. If extended dtype, the alignment flag, else `-1`.
7. The `flags` bit-flags; see `np.dtype.flags.__doc__`.
8. If datetime or with metadata, that metadata here, else absent.

-- 
Robert Kern
___