[Numpy-discussion] Re: Fortran-style support in ctypeslib.as_array()?

2023-03-23 Thread Eric Wieser
Hi Monte,

This strikes me as a slightly strange request; ctypes is intended to
interface with the C memory model, which has no native representation of
fortran arrays.
The underlying workings of `as_array` is to cast your ctype pointer into a
ctypes array object, and then pass that into numpy.
That approach doesn't work when there is no ctypes representation to begin
with

If you were writing C code to work with fortran arrays, probably you would
flatten your data into a single 1D array. You can use the same approach
here:

>>> np.ctypeslib.as_array(a_ptr, shape=(a.size,)).reshape(a.shape,
order='F')
array([[0, 1, 2],
   [3, 4, 5]])

Eric

On Thu, 23 Mar 2023 at 17:04,  wrote:

> Would it be okay to add an argument to ctypeslib.as_array() that allowed
> specifying that a pointer references column-major memory layout?
>
> Currently if we use ndarray.ctypes.data_as() to get a pointer to a
> Fortran-ordered array and then we use ctypeslib.as_array() to read that
> same array back in, we don't have a way of doing the round trip correctly.
>
> For example:
> >>> import ctypes as ct
> >>> a = np.arange(6).reshape(2,3)
> >>> a = np.asfortranarray(a)
> >>> a
> array([[0, 1, 2],
>[3, 4, 5]])
> >>> a_ptr = a.ctypes.data_as(ct.POINTER(ct.c_int))
> >>> b = np.ctypeslib.as_array(a_ptr, shape=a.shape)
> >>> b
> array([[0, 3, 1],
>[4, 2, 5]])
>
> The proposed function signature would be something like:
> numpy.ctypeslib.as_array(obj, shape=None, order='None'), with order{‘C’,
> ‘F’}, optional
>
> Thanks,
> Monte
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: wieser.eric+nu...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


Re: [Numpy-discussion] Drop LGTM testing.

2021-08-14 Thread Eric Wieser
>  I'd be happy to tag someone at LGTM, but I don't know who that would be.

I've tagged @jhelie in the past with some success, although it's been a
while so they may no longer be employed there!

Eric

On Sat, 14 Aug 2021 at 22:16, Charles R Harris 
wrote:

>
>
> On Sat, Aug 14, 2021 at 2:35 PM Eric Wieser 
> wrote:
>
>> This might be worth creating a github issue for simply so we can tag
>> someone working at LGTM; they've been helpful in the past, and it's
>> possible we just need to fiddle with some configuration to make it work.
>>
>> It's also worth noting that LGTM runs against C code too; so even if we
>> disable it for python, it might be worth keeping around for C.
>>
>
> It's the C code that causes problems, LGTM builds the code with `python3
> setup.py` and setup.py has a check for the Python version. There is no
> method to disable the C checks from the Github app and no method to specify
> Python version beyond 2 or 3.
>
> I'd be happy to tag someone at LGTM, but I don't know who that would be.
>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Drop LGTM testing.

2021-08-14 Thread Eric Wieser
This might be worth creating a github issue for simply so we can tag
someone working at LGTM; they've been helpful in the past, and it's
possible we just need to fiddle with some configuration to make it work.

It's also worth noting that LGTM runs against C code too; so even if we
disable it for python, it might be worth keeping around for C.

Eric

On Sat, 14 Aug 2021 at 21:27, Charles R Harris 
wrote:

> Hi All,
>
> LGTM on github uses Python 3.7, which causes a problem if we drop 3.7
> support. LGTM is nice for pointing to possible code improvements, but we
> mostly ignore it anyway. There are probably standalone code analysers that
> would serve our needs as well, so dropping it seems the easiest way forward.
>
> Thoughts?
>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] copy="never" discussion and no deprecation cycle?

2021-06-21 Thread Eric Wieser
Stefan, that sketch is more complicated than it needs to be - `np.copy` is
a python function, so you can just attach the attributes directly!
(although maybe there are implications for static typing)
```
class CopyFlag(enum.Enum):
IF_NEEDED = 0
ALWAYS = 1
NEVER = 2

np.copy.IF_NEEDED = CopyFlag.IF_NEEDED
np.copy.ALWAYS = CopyFlag.ALWAYS
np.copy.NEVER = CopyFlag.NEVER
```
It would also work nicely for the `True/False/other` version that was
proposed in the much older PR as `np.never_copy`:
```
class _CopyNever:
def __bool__(self): raise ValueError

np.copy.NEVER = _CopyNever()
```

All of these versions (and using the enum directly) seem fine to me.
If we go down the enum route route, we probably want to add "new-style"
versions of `np.CLIP` and friends that are true enums / live within a more
obvious namespace.

Eric

On Mon, 21 Jun 2021 at 17:24, Stefan van der Walt 
wrote:

> On Sun, Jun 20, 2021, at 20:46, Gagandeep Singh wrote:
> > I have recently joined the mailing list and have gone through the
> previous discussions on this thread. I would like to share my analysis
> (advantages and disadvantages) of three possible alternatives (Enum,
> String, boolean) to support the proposed feature.
>
> Thanks for this thorough analysis, Gagandeep!
>
> I'll throw one more heretical idea out there:
>
> `np.copy.IF_NEEDED`, `np.copy.ALWAYS`, `np.copy.NEVER`.
>
> This has the advantages of the enum, doesn't pollute the global namespace,
> and has an intuitive name.
>
> `np.array(x, copy=np.copy.ALWAYS)`
>
> It would be slightly more awkward to type, but is doable.  A rough Python
> version sketch would be:
>
> class CopyFlag(enum.Enum):
> IF_NEEDED = 0
> ALWAYS = 1
> NEVER = 2
>
> class NpCopy:
> IF_NEEDED : CopyFlag = CopyFlag.IF_NEEDED
> ALWAYS : CopyFlag = CopyFlag.ALWAYS
> NEVER : CopyFlag = CopyFlag.NEVER
>
> def __call__(self, x):
> return ...whatever copy returns...
>
> np.copy = NpCopy()
>
>
> Stéfan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] copy="never" discussion and no deprecation cycle?

2021-06-16 Thread Eric Wieser
I agree with Stephan, but even 3 seems dangerous to me. Any code that wraps
a numpy function and accepts a `copy` parameter (especially
`__array_function__`) is likely to contain `if copy` somewhere, which would
result in entirely (but likely silently) the wrong behavior for
`copy="never"`. An important reason for the original `np.never_copy`
suggestion the first time this was discussed is that it can overload
`__bool__` to raise or return `False` and warn, which would make silent bad
behavior visible one way or another.

I think a short NEP might be in order here, just so we can make sure we've
addressed everything that came up the previous time this was discussed.

Eric.



On Wed, Jun 16, 2021, 23:00 Stephan Hoyer  wrote:

> On Wed, Jun 16, 2021 at 1:01 PM Sebastian Berg 
> wrote:
>
>> 2. We introduce `copy="never"`, `copy="if_needed"` and `copy="always"`
>>as strings (all other strings will be a `TypeError`):
>>
>>* Problem: `copy="never"` currently means `copy=True` (the opposite)
>>Which means new code has to take care when it may run on
>>older NumPy versions.  And in theory could make old code
>>return the wrong thing.
>>
>
> To me, this seems like a big problem.
>
> People try to use newer NumPy features on old versions of NumPy all the
> time. This works out OK if they get error messages, but we shouldn't add
> new features that silently do something else on old versions -- especially
> for recent old versions.
>
> In particular, both copy='if_needed' and copy='never' would mean
> copy='always' on old versions of NumPy. This seems bad -- basically the
> exact opposite of what the user explicitly requested. These sort of bugs
> can be quite challenging to track down.
>
> So in my opinion (1) and (3) are the only real options.
>
>
>> 3. Same as 2. But we take it very slow: Make strings an error right now
>>and only introduce the new options after two releases as per typical
>>deprecation policy.
>>
>>
>> ## Discussion
>>
>> We discussed it briefly today in the triage call and we were leaning
>> towards strings.
>>
>> I was honestly expecting to converge to option 3 to avoid compatibility
>> issues (mainly surprises with `copy="never"` on older versions).
>> But considering how weird it is to currently pass `copy="never"`, the
>> question was whether we should not change it with a release note.
>>
>> The probability of someone currently passing exactly one of those three
>> (and no other) strings seems exceedingly small.
>>
>> Personally, I don't have a much of an opinion.  But if *nobody* voices
>> any concern about just changing the meaning of the string inputs, I
>> think the current default may be to just do it.
>>
>> Cheers,
>>
>> Sebastian
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] linalg.det for fractions

2021-05-16 Thread Eric Wieser
Numpy implements linalg.det by going through LAPACK, which only knows about
f4, f8, c8, and c16 data types.

Your request amounts to wanting an `O` dtype implementation. I think this
is a totally reasonable request as we already have such an implementation
for `np.matmul`; but it won't be particularly easy to implement or fast,
especially as it won't be optimized for fractions specifically.

Some other options for you would be to:

* use sympy's matrix operations; fractions are really just "symbolics lite"
* Extract a common denominator from your matrix, convert the numerators to
float64, and hope you don't exceed 2**52 in the result.

You could improve the second option a little by implementing (and PRing) an
integer loop for `det`, which would be somewhat easier than implementing
the object loop.

Eric


On Sun, May 16, 2021, 10:14 Shashwat Jaiswal 
wrote:

> How about having linalg.det returning a fraction object when passed a
> numpy matrix of fractions?
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies

2021-05-15 Thread Eric Wieser
Note that PEP-445 which introduced `PyMemAllocatorEx` specifically rejected
omitting the `ctx` argument here:
https://www.python.org/dev/peps/pep-0445/#id23, which is another argument
in favor of having it.

I'll try to give a more thorough justification for the pyobject / capsule
suggestion in another message in the next few days.



On Thu, 13 May 2021 at 17:06, eliaskoromilas 
wrote:

> Eric Wieser wrote
> >> Yes, sorry, had been a while since I had looked it up:
> >>
> >> https://docs.python.org/3/c-api/memory.html#c.PyMemAllocatorEx
> >
> > That `PyMemAllocatorEx` looks almost exactly like one of the two variants
> > I
> > was proposing. Is there a reason for wanting to define our own structure
> > vs
> > just using that one?
> > I think the NEP should at least offer a brief comparison to that
> > structure,
> > even if we ultimately end up not using it.
> >
> >> I have to say it feels a bit
> >> like exposing things publicly, that are really mainly used internally,
> >> but not sure...  Presumably Python uses the `ctx` for something though.
> >
> > I'd argue `ctx` / `baton` / `user_data` arguments are an essential part
> of
> > any C callback API.
> > I can't find any particularly good reference for this right now, but I
> > have
> > been bitten multiple times by C APIs that forget to add this argument.
> >
> >>  If someone wants a different strategy (i.e. different alignment) they
> > create a new policy
> >
> > The crux of the problem here is that without very nasty hacks, C and C++
> > do
> > not allow new functions to be created at runtime.
> > This makes it very awkward to write a parameterizable allocator. If you
> > want to create two aligned allocators with different alignments, and you
> > don't have a `ctx` argument to plumb through that alignment information,
> > you're forced to write the entire thing twice.
>
> The `PyMemAllocatorEx` memory API will allow (lambda) closure-like
> definition of the data mem routines. That's the main idea behind the `ctx`
> thing, it's huge and will enable every allocation scenario.
>
> In my opinion, the rest of the proposals (PyObjects, PyCapsules, etc.) are
> secondary and could be considered out-of-scope. I would suggest to let
> people use this before hiding it behind a strict API.
>
> Let me also give you an insight of how we plan to do it, since we are the
> first to integrate this in production code. Considering this NEP as a
> primitive API, I developed a new project to address our requirements:
>
> 1. Provide a Python-native way to define a new numpy allocator
> 2. Accept data mem routine symbols (function pointers) from open dynamic
> libraries
> 3. Allow local-scoped allocation, e.g. inside a `with` statement
>
> But since there was not much fun in these, I thought it would be nice if we
> could exploit `ctypes` callback functions, to allow developers hook into
> such routines natively (e.g. for debugging/monitoring), or even write them
> entirely in Python (of course there has to be an underlying memory
> allocation API).
>
> For example, the idea is to be able to define a page-aligned allocator in
> ~30 lines of Python code, like that:
>
>
> https://github.com/inaccel/numpy-allocator/blob/master/test/aligned_allocator.py
>
> ---
>
> While experimenting with this project I spotted the two following issues:
>
> 1. Thread-locality
> My biggest concern is the global scope of the numpy `current_allocator`
> variable. Currently, an allocator change is applied globally affecting
> every
> thread. This behavior breaks the local-scoped allocation promise of my
> project. Imagine for example the implications of allocating pinned
> (page-locked) memory (since you mention this use-case a lot) for random
> glue-code ndarrays in background threads.
>
> 2. Allocator context (already discussed)
> I found a bug, when I tried to use a Python callback (`ctypes.CFUNCTION`)
> for the `PyDataMem_FreeFunc` routine. Since there are cases in which the
> `free` routine is invoked after a PyErr has occurred (to clean up internal
> arrays for example), `ctypes` messes with the exception state badly. This
> problem can be resolved with the the use of a `ctx` (allocator context)
> that
> will allow the routines to run clean of errors, wrapping them like that:
>
> ```
> static void wrapped_free(void *ptr, size_t size, void *ctx) {
> PyObject *type;
> PyObject *value;
> PyObject *traceback;
> PyErr_Fetch(&type, &value, &traceback);
> ((Py

Re: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies

2021-05-11 Thread Eric Wieser
> Yes, sorry, had been a while since I had looked it up:
>
> https://docs.python.org/3/c-api/memory.html#c.PyMemAllocatorEx

That `PyMemAllocatorEx` looks almost exactly like one of the two variants I
was proposing. Is there a reason for wanting to define our own structure vs
just using that one?
I think the NEP should at least offer a brief comparison to that structure,
even if we ultimately end up not using it.

> That all looks like it can be customized in theory. But I am not sure
> that it is practical, except for hooking and calling the previous one.

Is chaining allocators not likely something we want to support too? For
instance, an allocator that is used for large arrays, but falls back to the
previous one for small arrays?

> I have to say it feels a bit
> like exposing things publicly, that are really mainly used internally,
> but not sure...  Presumably Python uses the `ctx` for something though.

I'd argue `ctx` / `baton` / `user_data` arguments are an essential part of
any C callback API.
I can't find any particularly good reference for this right now, but I have
been bitten multiple times by C APIs that forget to add this argument.

>  If someone wants a different strategy (i.e. different alignment) they
create a new policy

The crux of the problem here is that without very nasty hacks, C and C++ do
not allow new functions to be created at runtime.
This makes it very awkward to write a parameterizable allocator. If you
want to create two aligned allocators with different alignments, and you
don't have a `ctx` argument to plumb through that alignment information,
you're forced to write the entire thing twice.

> I guess the C++ similarity may be a reason, but I am not familiar with
that.

Similarity isn't the only motivation - I was considering compatibility.
Consider a user who's already written a shiny stateful C++ allocator, and
wants to use it with numpy.
I've made a gist at
https://gist.github.com/eric-wieser/6d0fde53fc1ba7a2fa4ac208467f2ae5 which
demonstrates how to hook an arbitrary C++ allocator into this new numpy
allocator API, that compares both the NEP version and the version with an
added `ctx` argument.
The NEP version has a bug that is very hard to fix without duplicating the
entire `numpy_handler_from_cpp_allocator` function.

If compatibility with C++ seems too much of a stretch, the NEP API is not
even compatible with `PyMemAllocatorEx`.

> But right now the proposal says this is static, and I honestly don't
> see much reason for it to be freeable?  The current use-cases `cupy` or
> `pnumpy` don't not seem to need it.

I don't know much about either of these use cases, so the following is
speculative.
In cupy, presumably the application is to tie allocation to a specific GPU
device.
Presumably then, somewhere in the python code there is a handle to a GPU
object, through which the allocators operate.
If that handle is stored in the allocator, and the allocator is freeable,
then it is possible to write code that automatically releases the GPU
handle after the allocator has been restored to the default and the last
array using it is cleaned up.

If that cupy use-case seems somwhat plausible, then I think we should go
with the PyObject approach.
If it doesn't seem plausible, then I think the `ctx` approach is
acceptable, and we should consider declaring our struct
```struct { PyMemAllocatorEx allocator; char const *name; }``` to reuse the
existing python API unless there's a reason not to.

Eric




On Tue, 11 May 2021 at 04:58, Matti Picus  wrote:

> On 10/5/21 8:43 pm, Sebastian Berg wrote:
>
> > But right now the proposal says this is static, and I honestly don't
> > see much reason for it to be freeable?
>
>
> I think this is the crux of the issue. The current design is for a
> singly-allocated struct to be passed around since it is just an
> aggregate of functions. If someone wants a different strategy (i.e.
> different alignment) they create a new policy: there are no additional
> parameters or data associated with the struct. I don't really see an ask
> from possible users for anything more, and so would prefer to remain
> with the simplest possible design. If the need arises in the future for
> additional data, which is doubtful, I am confident we can expand this as
> needed, and do not want to burden the current design with unneeded
> optional features.
>
>
> It would be nice to hear from some actual users if they need the
> flexibility.
>
>
> In any case I would like to resolve this quickly and get it into the
> next release, so if Eric is adamant that the advanced design is needed I
> will accept his proposal, since that seems easier than any of the
> alternatives so far.
>
>
> Matti
>
> ___
> NumPy-Discussion maili

Re: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies

2021-05-10 Thread Eric Wieser
> The Python version of this does have a `void *ctx`, but I am not sure if
the use for this is actually valuable for the NumPy use-cases.

Do you mean "the CPython version"? If so, can you link a reference?

>  While I like the `PyObject *` idea, I am also not sure that it helps
much.  If we want allocation specific state, the user should overallocate
and save it before the actual allocation.

I was talking about allocator- not allocation- specific state. I agree that
the correct place to store the latter is by overallocating, but it doesn't
make much sense to me to duplicate state about the allocator itself in each
allocation.

> But if we don't mind the churn it creates, the only serious idea I would
have right now is using a `FromSpec` API. We could allow get/set functions
on it though

We don't even need to go as far as a flexible `FromSpec` API. Simply having
a function to allocate (and free) the opaque struct and a handful of
getters ought to be enough to let us change the allocator to be stateful in
future.
On the other hand, this is probably about as much work as just making it a
PyObject in the first place.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies

2021-05-06 Thread Eric Wieser
Another argument for supporting stateful allocators would be compatibility
with the stateful C++11 allocator API, such as
https://en.cppreference.com/w/cpp/memory/allocator_traits/allocate.

Adding support for stateful allocators at a later date would almost
certainly create an ABI breakage or lots of pain around avoiding one.

I haven't thought very much about the PyCapsule approach (although it
appears some other reviewers on github considered it at one point), but
even building it from scratch, the overhead to support statefulness is not
large.
As I demonstrate on the github issue (18805), would amount to changing the
API from:
```C
// the version in the NEP
typedef void *(PyDataMem_AllocFunc)(size_t size);
typedef void *(PyDataMem_ZeroedAllocFunc)(size_t nelems, size_t elsize);
typedef void (PyDataMem_FreeFunc)(void *ptr, size_t size);
typedef void *(PyDataMem_ReallocFunc)(void *ptr, size_t size);
typedef struct {
char name[200];
PyDataMem_AllocFunc *alloc;
PyDataMem_ZeroedAllocFunc *zeroed_alloc;
PyDataMem_FreeFunc *free;
PyDataMem_ReallocFunc *realloc;
} PyDataMem_HandlerObject;
const PyDataMem_Handler * PyDataMem_SetHandler(PyDataMem_Handler *handler);
const char * PyDataMem_GetHandlerName(PyArrayObject *obj);
```
to
```C
// proposed changes: a `PyObject *self` argument pointing to a
`PyDataMem_HandlerObject` and a ` PyObject_HEAD`
typedef void *(PyDataMem_AllocFunc)(PyObject *self, size_t size);
typedef void *(PyDataMem_ZeroedAllocFunc)(PyObject *self, size_t nelems,
size_t elsize);
typedef void (PyDataMem_FreeFunc)(PyObject *self, void *ptr, size_t size);
typedef void *(PyDataMem_ReallocFunc)(PyObject *self, void *ptr, size_t
size);
typedef struct {
PyObject_HEAD
PyDataMem_AllocFunc *alloc;
PyDataMem_ZeroedAllocFunc *zeroed_alloc;
PyDataMem_FreeFunc *free;
PyDataMem_ReallocFunc *realloc;
} PyDataMem_HandlerObject;
// steals a reference to handler, caller is responsible for decrefing the
result
PyDataMem_Handler * PyDataMem_SetHandler(PyDataMem_Handler *handler);
// borrowed reference
PyDataMem_Handler * PyDataMem_GetHandler(PyArrayObject *obj);

// some boilerplate that numpy is already full of and doesn't impact users
of non-stateful allocators
PyTypeObject PyDataMem_HandlerType = ...;
```
When constructing an array, the reference count of the handler would be
incremented before storing it in the array struct

Since the extra work now to support this is not awful, but the potential
for ABI headaches down the road is, I think we should aim to support
statefulness right from the start.
The runtime overhead of the stateful approach above vs the NEP approach is
negligible, and consists of:
* Some overhead costs for setting up an allocator. This likely only happens
near startup, so won't matter.
* An extra incref on each array allocation
* An extra pointer argument on the stack for each allocation and
deallocation
* Perhaps around 32 extra bytes per allocator objects. Since arrays just
store pointers to allocators this doesn't matter.

Eric


On Thu, 6 May 2021 at 12:43, Matti Picus  wrote:

>
> On 6/5/21 2:07 pm, Eric Wieser wrote:
> > The NEP looks good, but I worry the API isn't flexible enough. My two
> > main concerns are:
> >
> > ### Stateful allocators
> >
> > Consider an allocator that aligns to `N` bytes, where `N` is
> > configurable from a python call in someone else's extension module.
> > ...
> >
> > ### Thread and async-local allocators
> >
> > For tracing purposes, I expect it to be valuable to be able to
> > configure the allocator within a single thread / coroutine.
> > If we want to support this, we'd most likely want to work with the
> > PEP567 ContextVar API rather than a half-baked thread_local solution
> > that doesn't work for async code.
> >
> > This problem isn't as pressing as the statefulness problem.
> > Fixing it would amount to extending the `PyDataMem_SetHandler` API,
> > and would be unlikely to break any code written against the current
> > version of the NEP; meaning it would be fine to leave as a follow-up.
> > It might still be worth remarking upon as future work of some kind in
> > the NEP.
> >
> >
> I would prefer to leave both of these to a future extension for the NEP.
> Setting the alignment from a python-level call seems to be asking for
> trouble, and I would need to be convinced that the extra layer of
> flexibility is worth it.
>
>
> It might be worth mentioning that this NEP may be extended in the
> future, but truthfully I think that is the case for all NEPs.
>
>
> Matti
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies

2021-05-06 Thread Eric Wieser
The NEP looks good, but I worry the API isn't flexible enough. My two main
concerns are:

###  Stateful allocators

Consider an allocator that aligns to `N` bytes, where `N` is configurable
from a python call in someone else's extension module. Where do they store
`N`?
They can hide it in `PyDataMem_Handler::name` but that's obviously an abuse
of the API.
They can store it as a global variable, but then obviously the idea of
tracking the allocator used to construct an array doesn't work, as the
state ends up changing with the global allocator.

The easy way out here would be to add a `void* context` field to the
structure, and pass it into all the methods.
This doesn't really solve the problem though, as now there's no way to
cleanup any allocations used to populate `context`, or worse decrement
references to python objects stored within `context`.
I think we want to bundle `PyDataMem_Handler` in a `PyObject` somehow,
either via a new C type, or by using the PyCapsule API which has the
cleanup and state hooks we need.
`PyDataMem_GetHandlerName` would then return this PyObject rather than an
opaque name.

For a more exotic case - consider a file-backed allocator, that is
constructed from a python `mmap` object which manages blocks within that
mmap.
The allocator needs to keep a reference to the `mmap` object alive until
all the arrays allocated within it are gone, but probably shouldn't leak a
reference to it either.

### Thread and async-local allocators

For tracing purposes, I expect it to be valuable to be able to configure
the allocator within a single thread / coroutine.
If we want to support this, we'd most likely want to work with the PEP567
ContextVar API rather than a half-baked thread_local solution that doesn't
work for async code.

This problem isn't as pressing as the statefulness problem.
Fixing it would amount to extending the `PyDataMem_SetHandler` API, and
would be unlikely to break any code written against the current version of
the NEP; meaning it would be fine to leave as a follow-up.
It might still be worth remarking upon as future work of some kind in the
NEP.


Eric

On Thu, 6 May 2021 at 11:41, Matti Picus  wrote:

> Here is the current rendering of the
> NEP:https://numpy.org/neps/nep-0049.html
>
>
>
> The mailing list discussion, started on April 20 did not bring up any
> objections to the proposal, nor were there objections in the discussion
> around the text of the NEP. There were questions around details of the
> implementation, thank you reviewers for carefully looking at them and
> suggesting improvements.
>
>
> If there are no substantive objections within 7 days from this email,
> then the NEP will be accepted; see NEP 0 for more details.
>
>
> Matti
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is there a defined way to "unpad" an array, and if not, should there be?

2021-04-13 Thread Eric Wieser
Some other options here that avoid the need for a new function:

* Add a `return_view` argument to `pad`, such that for `padded, orig =
np.pad(arr, ..., return_view=True)`, `orig == arr` and `orig.base is
padded`. This is useful if `padded` is modified in place, but less useful
otherwise. It has the advantage of not having to recompute the slices, as
pad already has them.
* Accept a `slice` object directly in `np.pad`; for `sl = np.s_[2:-20,
4:-40]`, `padded = np.pad(array, sl)`, we have `padded[sl] == array`.

The second idea seems promising to me, but perhaps there are corner cases I
haven't thought of that it wouldn't help with.

Eric

On Tue, 13 Apr 2021 at 09:26, Ralf Gommers  wrote:

>
>
> On Tue, Apr 13, 2021 at 3:37 AM Jeff Gostick  wrote:
>
>> It is great to hear that this might be useful.  I would LOVE to create a
>> PR on this idea and contribute back to numpy...but let's not get ahead of
>> ourselves :-)
>>
>> Regarding the name, I kinda like "unpad" since it relates directly to
>> "pad", analogous to "ravel" and "unravel" for instance.  Or maybe "depad".
>> Although, it's possible to use this on any array, not just a previously
>> padded one, so maybe tying it too directly to "pad" is not right, in which
>> case "trim" and "crop" are both perfect.  I must admit that I find it odd
>> that these functions are not in numpy already.  I just searched the docs
>> and they show up as keyword args for a few functions but are otherwise
>> conspicuously absent.  Also, funnily, there is a link to "padding arrays"
>> but it is basically empty:
>> https://numpy.org/doc/stable/reference/routines.padding.html.
>>
>> Alternatively, I don't hate the idea of passing negative pad widths into
>> "pad".  I actually tried this at one point to see if there was a hidden
>> functionality there, to no avail.
>>
>> BTW, we just adding a custom "unpad" function to our PoreSpy package for
>> this purpose:
>> https://github.com/PMEAL/porespy/blob/dev/porespy/tools/_unpadfunc.py
>>
>>
>>
>> On Mon, Apr 12, 2021 at 9:15 PM Stephan Hoyer  wrote:
>>
>>> On Mon, Apr 12, 2021 at 5:12 PM Jeff Gostick  wrote:
>>>
 I guess I should have clarified that I was inquiring about proposing a
 'feature request'.  The github site suggested I open a discussion on this
 list first.  There are several ways to effectively unpad an array as has
 been pointed out, but they all require more than a little bit of thought
 and care, are dependent on array shape, and honestly error prone.  It would
 be very valuable to me to have such a 'predefined' function, so I was
 wondering if (a) I was unaware of some function that already does this and
 (b) if I'm alone in thinking this would be useful.

>>>
>>> Indeed, this is a fair question.
>>>
>>> Given that this is not entirely trivial to write correctly, I think it
>>> would be reasonable to add the inverse operation for pad() into NumPy. This
>>> is generally better than encouraging users to write their own thing.
>>>
>>> From a naming perspective, here are some possibilities:
>>> unpad
>>> trim
>>> crop
>>>
>>> I think "trim" would be pretty descriptive, probably slightly better
>>> than "unpad."
>>>
>>
> I'm not a fan of `trim`. We already have `clip` which sounds similar.
>
> `unpad` looks like the only one that's completely unambiguous.
>
> `crop` sounds like an image processing function, and what we don't want is
> something like Pillow's `crop(left, top, right, bottom)`.
>
> Cheers,
> Ralf
>
>
> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] MAINT: Use of except-pass blocks

2021-04-06 Thread Eric Wieser
I think the add_docstring one is best left alone, since if it fails once it
will probably fail for every docstring in the system, and the logging would
be pure noise.

The ma/core one is suspicious - I can't think of any examples where an
error would occur, but if you're interested, I'd encourage you to try and
come up with a corner-case that enters that `except` block.

Eric

On Tue, Apr 6, 2021, 20:13 Michael Dubravski  wrote:

> Okay thank you for the input. Do you have any recommendations for the type
> of exception classes that they could be changed to?
>
>
>
> *From: *NumPy-Discussion  gmail@python.org> on behalf of Benjamin Root 
> *Reply-To: *Discussion of Numerical Python 
> *Date: *Tuesday, April 6, 2021 at 2:58 PM
> *To: *Discussion of Numerical Python 
> *Subject: *Re: [Numpy-discussion] MAINT: Use of except-pass blocks
>
>
>
> In both of those situations, the `pass` aspect makes sense, although they
> probably should specify a better exception class to catch. The first one,
> with the copyto() has a comment that explains what is goingon. The second
> one, dealing with adding to the docstring, is needed because one can run
> python in the "optimized" mode, which strips out docstrings.
>
>
>
> On Tue, Apr 6, 2021 at 2:27 PM Michael Dubravski 
> wrote:
>
> Hello everyone,
>
>
>
> There are multiple instances of except-pass blocks within the codebase
> that to my knowledge are bad practices (Referencing This StackOverflow
> Article
> .
> For example in numpy/ma/core.py there is an except-pass block that
> catches all exceptions thrown. Another example of this can be found in
> numpy/core/function_base.py. I was wondering if it would be a good idea
> to add some print statements for logging the exceptions caught. Also for
> cases where except-pass blocks are needed, is there an explanation for not
> logging exceptions?
>
>
>
>
> https://github.com/numpy/numpy/blob/914407d51b878bf7bf34dbd8dd72cc2dbc428673/numpy/ma/core.py#L1034-L1041
>
>
>
>
> https://github.com/numpy/numpy/blob/914407d51b878bf7bf34dbd8dd72cc2dbc428673/numpy/core/function_base.py#L461-L472
>
>
>
> Thanks,
>
> Michael Dubravski
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ___ NumPy-Discussion mailing
> list NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Programmatically contracting multiple tensors

2021-03-12 Thread Eric Wieser
Einsum has a secret integer argument format that appears in the Examples
section of the `np.einsum` docs, but appears not to be mentioned at all in
the parameter listing.

Eric

On Sat, 13 Mar 2021 at 00:25, Michael Lamparski 
wrote:

> Greetings,
>
> I have something in my code where I can receive an array M of unknown
> dimensionality and a list of "labels" for each axis.  E.g. perhaps I might
> get an array of shape (2, 47, 3, 47, 3) with labels ['spin', 'atom',
> 'coord', 'atom', 'coord'].
>
> For every axis that is labeled "coord", I want to multiply in some
> rotation matrix R.  So, for the above example, this could be done with the
> following handwritten line:
>
> return np.einsum('Cc,Ee,abcde->abCdE', R, R, M)
>
> But since I want to do this programmatically, I find myself in the awkward
> situation of having to construct this string (and e.g. having to
> arbitrarily limit the number of axes to 26 or something like that).  Is
> there a more idiomatic way to do this that would let me supply integer
> labels for summation indices?  Or should I just bite the bullet and start
> generating strings?
>
> ---
> Michael
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ENH: Proposal to add atleast_nd function

2021-02-12 Thread Eric Wieser
> There might be some linear algebraic reason why those axis positions make
sense, but I’m not aware of it...

My guess is that the historical motivation was to allow grayscale `(H, W)`
images to be converted into `(H, W, 1)` images so that they can be
broadcast against `(H, W, 3)` RGB images.

Eric

On Fri, 12 Feb 2021 at 02:32, Juan Nunez-Iglesias  wrote:

> both napari and scikit-image use atleast_ a few times. I don’t have many
> examples of where I used nd because it didn’t exist. But I have the very
> distinct impression of needing it repeatedly. In some places, I’ve used
> `np.broadcast_to` to signal the same intention, where `atleast_nd` would
> have been the more readable solution.
>
> I don’t buy the argument that it’s just a way to mask errors. NumPy
> broadcasting also has that same potential but I hope no one would seriously
> consider deprecating it. Indeed, even if we accept that we (library
> authors) should force users to provide an array of the right
> dimensionality, that still argues for making it convenient for users to do
> that!
>
> I don’t feel super strongly about this. But I think atleast_nd is a move
> in a positive direction and I’d prefer  it to what’s there now:
>
> In [1]: import numpy as np
> In [2]: np.atleast_3d(np.ones(4)).shape
> Out[2]: (1, 4, 1)
>
> There might be some linear algebraic reason why those axis positions make
> sense, but I’m not aware of it...
>
> Juan.
>
> On 12 Feb 2021, at 5:32 am, Eric Wieser 
> wrote:
>
> I did a quick search of matplotlib, and found a few uses of all three
> functions:
>
> *
> https://github.com/matplotlib/matplotlib/blob/fed55c63a314351cd39a12783f385009782c06e1/lib/matplotlib/_layoutgrid.py#L441-L446
>   This one isn't really numpy at all, and is really just a shorthand for
> normalizing an argument `x=n` to `x=[n, n]`
> *
> https://github.com/matplotlib/matplotlib/blob/dd249744270f6abe3f540f81b7a77c0cb728ddbb/lib/matplotlib/mlab.py#L888
>This one is the classic "either multivariate or single-variable data"
> thing endemic to the SciPy ecosystem.
> *
> https://github.com/matplotlib/matplotlib/blob/1eef019109b64ee4085732544cb5e310e69451ab/lib/matplotlib/cbook/__init__.py#L1325-L1326
>   Matplotlib has their own `_check_1d` function for input sanitization,
> although github says it's only used to parse the arguments to `plot`, which
> at this point are fairly established as being flexible.
> *
> https://github.com/matplotlib/matplotlib/blob/f72adc49092fe0233a8cd21aa0f317918dafb18d/lib/matplotlib/transforms.py#L631
>   This just looks like "defensive programming", and if the argument isn't
> already 3d then something is probably wrong.
>
> This isn't an exhaustive list, just a handful of different situations the
> functions were used.
>
> Eric
>
>
>
> On Thu, 11 Feb 2021 at 18:15, Stephan Hoyer  wrote:
>
>> On Thu, Feb 11, 2021 at 9:42 AM Benjamin Root 
>> wrote:
>>
>>> for me, I find that the at_least{1,2,3}d functions are useful for
>>> sanitizing inputs. Having an at_leastnd() function can be viewed as a step
>>> towards cleaning up the API, not cluttering it (although, deprecations of
>>> the existing functions probably should be long given how long they have
>>> existed).
>>>
>>
>> I would love to see examples of this -- perhaps in matplotlib?
>>
>> My thinking is that in most cases it's probably a better idea to keep the
>> interface simpler, and raise an error for lower-dimensional arrays.
>> Automatic conversion is convenient (and endemic within the SciPy
>> ecosystem), but is also a common source of bugs.
>>
>> On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer  wrote:
>>>
>>>> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias 
>>>> wrote:
>>>>
>>>>> I totally agree with the namespace clutter concern, but honestly, I
>>>>> would use `atleast_nd` with its `pos` argument (I might rename it to
>>>>> `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`,
>>>>> for which I had no idea where the new axes would end up.
>>>>>
>>>>> So, I’m in favour of including it, and optionally deprecating
>>>>> `atleast_{1,2,3}d`.
>>>>>
>>>>>
>>>> I appreciate that `atleast_nd` feels more sensible than
>>>> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
>>>> recommend is a good enough reason for inclusion in NumPy. It needs to stand
>>>> on its own.
>>>>
>>>> What would be the recommended use-

Re: [Numpy-discussion] ENH: Proposal to add atleast_nd function

2021-02-11 Thread Eric Wieser
I did a quick search of matplotlib, and found a few uses of all three
functions:

*
https://github.com/matplotlib/matplotlib/blob/fed55c63a314351cd39a12783f385009782c06e1/lib/matplotlib/_layoutgrid.py#L441-L446
  This one isn't really numpy at all, and is really just a shorthand for
normalizing an argument `x=n` to `x=[n, n]`
*
https://github.com/matplotlib/matplotlib/blob/dd249744270f6abe3f540f81b7a77c0cb728ddbb/lib/matplotlib/mlab.py#L888
   This one is the classic "either multivariate or single-variable data"
thing endemic to the SciPy ecosystem.
*
https://github.com/matplotlib/matplotlib/blob/1eef019109b64ee4085732544cb5e310e69451ab/lib/matplotlib/cbook/__init__.py#L1325-L1326
  Matplotlib has their own `_check_1d` function for input sanitization,
although github says it's only used to parse the arguments to `plot`, which
at this point are fairly established as being flexible.
*
https://github.com/matplotlib/matplotlib/blob/f72adc49092fe0233a8cd21aa0f317918dafb18d/lib/matplotlib/transforms.py#L631
  This just looks like "defensive programming", and if the argument isn't
already 3d then something is probably wrong.

This isn't an exhaustive list, just a handful of different situations the
functions were used.

Eric



On Thu, 11 Feb 2021 at 18:15, Stephan Hoyer  wrote:

> On Thu, Feb 11, 2021 at 9:42 AM Benjamin Root 
> wrote:
>
>> for me, I find that the at_least{1,2,3}d functions are useful for
>> sanitizing inputs. Having an at_leastnd() function can be viewed as a step
>> towards cleaning up the API, not cluttering it (although, deprecations of
>> the existing functions probably should be long given how long they have
>> existed).
>>
>
> I would love to see examples of this -- perhaps in matplotlib?
>
> My thinking is that in most cases it's probably a better idea to keep the
> interface simpler, and raise an error for lower-dimensional arrays.
> Automatic conversion is convenient (and endemic within the SciPy
> ecosystem), but is also a common source of bugs.
>
> On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer  wrote:
>>
>>> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias 
>>> wrote:
>>>
 I totally agree with the namespace clutter concern, but honestly, I
 would use `atleast_nd` with its `pos` argument (I might rename it to
 `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`,
 for which I had no idea where the new axes would end up.

 So, I’m in favour of including it, and optionally deprecating
 `atleast_{1,2,3}d`.


>>> I appreciate that `atleast_nd` feels more sensible than
>>> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
>>> recommend is a good enough reason for inclusion in NumPy. It needs to stand
>>> on its own.
>>>
>>> What would be the recommended use-cases for this new function?
>>> Have any libraries building on top of NumPy implemented a version of
>>> this?
>>>
>>>
 Juan.

 On 11 Feb 2021, at 9:48 am, Sebastian Berg 
 wrote:

 On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:

 I've created PR#18386 to add a function called atleast_nd to numpy and
 numpy.ma. This would generalize the existing atleast_1d, atleast_2d,
 and
 atleast_3d functions.

 I proposed a similar idea about four and a half years ago:
 https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
 ,
 PR#7804. The reception was ambivalent, but a couple of folks have asked
 me
 about this, so I'm bringing it back.

 Some pros:

 - This closes issue #12336
 - There are a couple of Stack Overflow questions that would benefit
 - Been asked about this a couple of times
 - Implementation of three existing atleast_*d functions gets easier
 - Looks nicer that the equivalent broadcasting and reshaping

 Some cons:

 - Cluttering up the API
 - Maintenance burden (but not a big one)
 - This is just a utility function, which can be achieved through
 broadcasting and reshaping


 My main concern would be the namespace cluttering. I can't say I use
 even the `atleast_2d` etc. functions personally, so I would tend to be
 slightly against the addition. But if others land on the "useful" side here
 (and it seemed a bit at least on github), I am also not opposed.  It is a
 clean name that lines up with existing ones, so it doesn't seem like a big
 "mental load" with respect to namespace cluttering.

 Bike shedding the API is probably a good idea in any case.

 I have pasted the current PR documentation (as html) below for quick
 reference. I wonder a bit about the reasoning for having `pos` specify a
 value rather than just a side?



 numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
 View input as array with at least ndim dimensions.
 New unit dimensions are inserted at the index given by *pos* if
 necessary.
 Para

Re: [Numpy-discussion] ENH: Proposal to add atleast_nd function

2021-02-11 Thread Eric Wieser
> I find that the at_least{1,2,3}d functions are useful for sanitizing
inputs

IMO, this type of "sanitization" goes against "In the face of ambiguity,
refuse the temptation to guess".
Instead of using `at_least{n}d`, it could be argued that `if np.ndim(x) !=
n: raise ValueError` is a safer bet, which forces the user to think about
what's actually going on, and saves them from silent headaches.

Of course, this is just an argument for discouraging users from using these
functions, and for the fact that we perhaps should not have had them in the
first place.
Given we already have some of them, adding `atleast_nd` probably isn't
going to make things any worse.
In principle, it could actually make things better, as we could put a
"Notes" section in the new function docs that describes the XY problem that
makes atleast_nd look like a better solution that it is and presents better
alternatives, and the other three function docs could link there.

Eric

On Thu, 11 Feb 2021 at 17:41, Benjamin Root  wrote:

> for me, I find that the at_least{1,2,3}d functions are useful for
> sanitizing inputs. Having an at_leastnd() function can be viewed as a step
> towards cleaning up the API, not cluttering it (although, deprecations of
> the existing functions probably should be long given how long they have
> existed).
>
> On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer  wrote:
>
>> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias 
>> wrote:
>>
>>> I totally agree with the namespace clutter concern, but honestly, I
>>> would use `atleast_nd` with its `pos` argument (I might rename it to
>>> `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`,
>>> for which I had no idea where the new axes would end up.
>>>
>>> So, I’m in favour of including it, and optionally deprecating
>>> `atleast_{1,2,3}d`.
>>>
>>>
>> I appreciate that `atleast_nd` feels more sensible than
>> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
>> recommend is a good enough reason for inclusion in NumPy. It needs to stand
>> on its own.
>>
>> What would be the recommended use-cases for this new function?
>> Have any libraries building on top of NumPy implemented a version of this?
>>
>>
>>> Juan.
>>>
>>> On 11 Feb 2021, at 9:48 am, Sebastian Berg 
>>> wrote:
>>>
>>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>>
>>> I've created PR#18386 to add a function called atleast_nd to numpy and
>>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d, and
>>> atleast_3d functions.
>>>
>>> I proposed a similar idea about four and a half years ago:
>>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
>>> ,
>>> PR#7804. The reception was ambivalent, but a couple of folks have asked
>>> me
>>> about this, so I'm bringing it back.
>>>
>>> Some pros:
>>>
>>> - This closes issue #12336
>>> - There are a couple of Stack Overflow questions that would benefit
>>> - Been asked about this a couple of times
>>> - Implementation of three existing atleast_*d functions gets easier
>>> - Looks nicer that the equivalent broadcasting and reshaping
>>>
>>> Some cons:
>>>
>>> - Cluttering up the API
>>> - Maintenance burden (but not a big one)
>>> - This is just a utility function, which can be achieved through
>>> broadcasting and reshaping
>>>
>>>
>>> My main concern would be the namespace cluttering. I can't say I use
>>> even the `atleast_2d` etc. functions personally, so I would tend to be
>>> slightly against the addition. But if others land on the "useful" side here
>>> (and it seemed a bit at least on github), I am also not opposed.  It is a
>>> clean name that lines up with existing ones, so it doesn't seem like a big
>>> "mental load" with respect to namespace cluttering.
>>>
>>> Bike shedding the API is probably a good idea in any case.
>>>
>>> I have pasted the current PR documentation (as html) below for quick
>>> reference. I wonder a bit about the reasoning for having `pos` specify a
>>> value rather than just a side?
>>>
>>>
>>>
>>> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
>>> View input as array with at least ndim dimensions.
>>> New unit dimensions are inserted at the index given by *pos* if
>>> necessary.
>>> Parameters*ary  *array_like
>>> The input array. Non-array inputs are converted to arrays. Arrays that
>>> already have ndim or more dimensions are preserved.
>>> *ndim  *int
>>> The minimum number of dimensions required.
>>> *pos  *int, optional
>>> The index to insert the new dimensions. May range from -ary.ndim - 1 to
>>> +ary.ndim (inclusive). Non-negative indices indicate locations before
>>> the corresponding axis: pos=0 means to insert at the very beginning.
>>> Negative indices indicate locations after the corresponding axis: pos=-1
>>>  means to insert at the very end. 0 and -1 are always guaranteed to
>>> work. Any other number will depend on the dimensions of the existing array.
>>> Default is 0.
>>> Returns*res  *ndarray
>>> An array with res.ndim

Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-11 Thread Eric Wieser
>  you might want to discuss this with us at the array API standard
> https://github.com/data-apis/array-api (which is currently in RFC
> stage). The spec uses bool as the name for the boolean dtype.

I don't fully understand this argument - `np.bool` is already not the
boolean dtype. Either:

* The spec is suggesting that `pkg.bool` be some arbitrary object that can
be passed into a dtype argument and will produce a boolean array.
  If this is the case, the spec could also just require that
`dtype=builtins.bool` have this behavior.
* The spec is suggesting that `pkg.bool` is some rich dtype object.
  Ignoring the question of whether this should be `np.bool_` or
`np.dtype(np.bool_)`, it's currently neither, and changing it will break
users relying on `np.bool(True) is True`.
  That's not to say this isn't a sensible thing for the specification to
have, it's just something that numpy can't conform to without breaking code.

While it would be great if `np.bool_` could be spelt `np.bool`, I really
don't think we can make that change without a long deprecation first (if at
all).

Eric

On Thu, 10 Dec 2020 at 20:00, Sebastian Berg 
wrote:

> On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
> > On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <
> > sebast...@sipsolutions.net>
> > wrote:
> >
> > > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
> > > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer 
> > > > wrote:
> > > >
> > > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
> > > > > > > Regarding np.bool specifically, if you want to deprecate
> > > > > > > this,
> > > > > > > you
> > > > > > > might want to discuss this with us at the array API
> > > > > > > standard
> > > > > > > https://github.com/data-apis/array-api (which is currently
> > > > > > > in
> > > > > > > RFC
> > > > > > > stage). The spec uses bool as the name for the boolean
> > > > > > > dtype.
> > > > > > >
> > > > > > > Would it make sense for NumPy to change np.bool to just be
> > > > > > > the
> > > > > > > boolean
> > > > > > > dtype object? Unlike int and float, there is no ambiguity
> > > > > > > with
> > > > > > > bool,
> > > > > > > and NumPy clearly doesn't have any issues with shadowing
> > > > > > > builtin
> > > > > > > names
> > > > > > > in its namespace.
> > > > > >
> > > > > > We could keep the Python alias around (which for `dtype=` is
> > > > > > the
> > > > > > same
> > > > > > as `np.bool_`).
> > > > > >
> > > > > > I am not sure I like the idea of immediately shadowing the
> > > > > > builtin.
> > > > > > That is a switch we can avoid flipping (without warning);
> > > > > > `np.bool_`
> > > > > > and `bool` are fairly different beasts? [1]
> > > > >
> > > > > NumPy already shadows a lot of builtins, in many cases, in ways
> > > > > that
> > > > > are incompatible with existing ones. It's not something I would
> > > > > have
> > > > > done personally, but it's been this way for a long time.
> > > > >
> > > >
> > > > It may be defensible to keep np.bool as an alias for Python's
> > > > bool
> > > > even when we remove the other aliases.
> > >
> >
> > I'd agree with that.
> >
> >
> > > That is true, `int` is probably the most confusing, since it is not
> > > at
> > > all compatible to a Python integer, but rather the "default"
> > > integer
> > > (which happens to be the same as C `long` currently).
> > >
> > > So we could focus on `np.int`, `np.long`.  I am a bit unsure
> > > whether
> > > you would prefer that or are mainly pointing out the possibility?
> > >
> >
> > Not sure what you mean with focus, focus on describing in the release
> > notes? Deprecating `np.int` seems like the most beneficial part of
> > this
> > whole exercise.
> >
>
> I meant limiting the current deprecation to `np.int`, maybe `np.long`,
> and a "carefully chosen" set.
> To be honest, I don't mind either way, so any stronger opinion will tip
> the scale for me personally (my default currently is to update the
> release notes to recommend the more descriptive names).
>
> There are probably more doc updates that would be nice, I will suggest
> updating a separate issue for that.
>
>
> > Right now, my main take-away from the discussion is that it would be
> > > good to clarify the release notes a bit more.
> > >
> > > Using `float` for a dtype seems fine to me, but I prefer mentioning
> > > `np.float64` over `np.float_`.
> > > For integers, I wonder if we should also suggest `np.int64`, even –
> > > or
> > > because – if the default integer on many systems is currently
> > > `np.int_`?
> > >
> >
> > I agree. I think we should recommend sane, descriptive names that do
> > the
> > right thing. So ideally we'd have people spell their dtype specifiers
> > as
> >   dtype=bool  # or np.bool
> >   dtype=np.float64
> >   dtype=np.int64
> >   dtype=np.complex128
> > The names with underscores at the end make little sense from a UX
> > perspective. And the 

Re: [Numpy-discussion] np.{bool,float,int} deprecation

2020-12-07 Thread Eric Wieser
If the CI noise in downstream libraries is particularly painful, we could
switch to `PendingDeprecationWarning` instead of `DeprecationWarning` to
make it easier to add the warnings to an ignore list.
I think this might make the warning less visible to end users though, who
are the users that this deprecation was really aimed at.

Eric

On Mon, 7 Dec 2020 at 11:39, Ralf Gommers  wrote:

>
>
> On Sun, Dec 6, 2020 at 4:23 PM Sebastian Berg 
> wrote:
>
>> On Sat, 2020-12-05 at 20:12 -0700, Charles R Harris wrote:
>> > On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias 
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > At the prodding [1] of Sebastian, I’m starting a discussion on the
>> > > decision to deprecate np.{bool,float,int}. This deprecation broke
>> > > our
>> > > prerelease testing in scikit-image (which, hooray for rcs!), and
>> > > resulted
>> > > in a large amount of code churn to fix [2].
>> > >
>> > > To be honest, I do think *some* sort of deprecation is needed,
>> > > because for
>> > > the longest time I thought that np.float was what np.float_
>> > > actually is. I
>> > > think it would be worthwhile to move to *that*, though it’s an even
>> > > more
>> > > invasive deprecation than the currently proposed one. Writing `x =
>> > > np.zeros(5, dtype=int)` is somewhat magical, because someone with a
>> > > strict
>> > > typing mindset (there’s an increasing number!) might expect that
>> > > this is an
>> > > array of pointers to Python ints. This is why I’ve always preferred
>> > > to
>> > > write `dtype=np.int`, resulting in the current code churn.
>> > >
>> > > I don’t know what the best answer is, just sparking the discussion
>> > > Sebastian wants to see. ;) For skimage we’ve already merged a fix
>> > > (even if
>> > > it is one of dubious quality, as Stéfan points out [3] ;), so I
>> > > don’t have
>> > > too much stake in the outcome.
>> > >
>> > > Juan.
>> > >
>> > > [1]:
>> > >
>> https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739334463
>> > > [2]: https://github.com/scikit-image/scikit-image/pull/5103
>> > > [3]:
>> > >
>> https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739368765
>> > >
>> >
>> > I checked pandas and astropy and both have several uses of the
>> > deprecated
>> > types but should be easy to fix. I suppose the question is if we want
>> > to
>> > make them fix things *right now* :)
>> >
>>
>>
>> The reason why I thought it might be good to bring this up again is
>> that I am not sure clear on how painful the deprecation is; which
>> should be weighed against the benefit.  And the benefit here is only
>> moderate.
>>
>
> It will be painful as in "lots of churn", but the fixes are
> straightforward. And it's clear many knowledgeable users didn't know they
> were aliases, so there is something to gain here.
>
> Whether or not we revert the deprecation, I'd be in favor of improving the
> docs to answer the most common questions and pitfalls, like:
>
> - What happens when I use Python builtin types with the dtype keyword?
> - How do I check if something is an integer array? Or a NumPy or Python
> integer?
> - What are default integer, float and complex precisions on all platforms?
> - How do I iterate over all floating point dtypes when writing tests?
> - Which of the many equivalent dtypes should I prefer? --> use float64,
> not float_ or double
> - warning: float128 and float96 do not exist on all platforms
> -
> https://github.com/scikit-learn/scikit-learn/wiki/C-integer-types%3A-the-missing-manual
>
> Related: it's still easy to have things leak into the namespace
> unintentionally - `np.sys` and `np.os` exist too. I think we can probably
> clean those up without a deprecation, but we should write some more public
> API tests that prevent this kind of thing.
>
> Cheers,
> Ralf
>
>
>
>> Thus, with the things now in and a few more people exposed to it, if
>> anyone thinks its a bad idea or that we should delay, I am all ears.
>>
>> Cheers,
>>
>> Sebastian
>>
>>
>> > Chuck
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] datetime64: Remove deprecation warning when constructing with timezone

2020-11-05 Thread Eric Wieser
Without weighing in yet on how I feel about the deprecation, you can see
some discussion about why this was originally deprecated in the PR that
introduced the warning:

https://github.com/numpy/numpy/pull/6453

Eric

On Thu, Nov 5, 2020, 20:13 Noam Yorav-Raphael  wrote:

> Hi,
>
> I suggest removing the deprecation warning when constructing a datetime64
> with a timezone. For example, this is the current behavior:
>
> >>> np.datetime64('2020-11-05 16:00+0200')
> :1: DeprecationWarning: parsing timezone aware datetimes is
> deprecated; this will raise an error in the future
> numpy.datetime64('2020-11-05T14:00')
>
> I suggest removing the deprecation warning because I find this to be a
> useful behavior, and because it is a correct behavior. The manual says:
> "The datetime object represents a single moment in time... Datetimes are
> always stored based on POSIX time, with an epoch of 1970-01-01T00:00Z."
> So 2020-11-05T16:00+0200 is indeed the moment in time represented by
> np.datetime64('2020-11-05T14:00').
>
> I just used this to restrict my data set to records created after a
> certain moment. It was easier for me to write the moment in my local time
> and add "+0200" than to figure out the moment representation in UTC.
>
> So this is my simple suggestion: remove the deprecation warning.
>
>
> Beyond that, I have 3 ideas for changing the repr of datetime64 that I
> would like to discuss.
>
> 1. Add "Z" at the end, for example, numpy.datetime64('2020-11-05T14:00Z').
> This will make it clear to which moment it refers. I think this is
> significant - I had to dig quite a bit to realize that
> datetime64('2020-11-05T14:00') means 14:00 UTC.
>
> 2. Replace the 'T' with a space. I just find it much easier to read
> '2020-11-05 14:00Z' than '2020-11-05T14:00Z'. The long sequence of
> characters makes it hard for my brain to parse.
>
> 3. This will require discussion, but will be very convenient: have the
> repr display the time using the environment time zone, including a time
> offset. So, in my specific time zone (+0200), I will have:
>
> repr(np.datetime64('2020-11-05 14:00Z')) ==
> "numpy.datetime64('2020-11-05T16:00+0200')"
>
> I'm sure the pros and cons of having an environment-dependent repr should
> be discussed. But I will list some pros:
> 1. It's very convenient - it's immediately obvious to me to which moment
> 2020-11-05 16:00+0200 refers.
> 2. It's well defined - I may collect timestamps from machines with
> different time zones, and I will be able to know to which exact moment each
> timestamp refers.
> 3. It's very simple - I could compare any two timestamps, I don't have to
> worry about time zones.
>
> I would be happy to hear your thoughts.
>
> Thanks,
> Noam
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for comments on PEP 637 - Support for indexing with keyword arguments

2020-09-25 Thread Eric Wieser
I agree with Stephan's suggestion of having no default value for positional
indices, and letting the user supply it.

It seems I replied badly to the mail on the python-ideas list, and my
response ended up as a separate thread, at
https://mail.python.org/archives/list/python-id...@python.org/thread/ZQ4CHT37YUHYEFYFBQ423QNZMFCRULAW/
.

Eric

On Thu, 24 Sep 2020 at 22:22, Aaron Meurer  wrote:

> This PEP also opens the possibility of allowing a[] with nothing in the
> getitem. Has that been considered?
>
> Aaron Meurer
>
> On Thu, Sep 24, 2020 at 1:48 PM Sebastian Berg 
> wrote:
>
>> On Wed, 2020-09-23 at 21:41 -0700, Stephan Hoyer wrote:
>> > On Wed, Sep 23, 2020 at 2:22 PM Stefano Borini <
>> > stefano.bor...@gmail.com>
>> > wrote:
>>
>> 
>>
>> > >
>> > Hi Stefano -- thanks for pushing this proposal forward! I am sure
>> > that
>> > support for keyword indexing will be very welcome in the scientific
>> > Python
>> > ecosystem.
>> >
>> > I have not been following the full discussion on PEP 637, but I
>> > recall
>> > seeing another suggestion earlier for what this could be resolved
>> > into:
>> >
>> > type(obj).__getitem__(obj, x=23)
>> >
>> > I.e., not passing a positional argument at all.
>> >
>> > The author of a class that supports keyword indexing could check this
>> > sort
>> > of case with a positional only argument with a default value, e.g.,
>> >
>> > def __getitem__(self, key=MY_SENTINEL, /, **kwargs):
>> >
>> > where MY_SENTINEL could be any desired sentinel value, including
>> > either
>> > None or (). Is there a reason for rejecting this option? It seems
>> > like a
>> > nice explicit alternative to prespecifing the choice of sentinel
>> > value.
>> >
>> > I guess the concern might be that this would not suffice for
>> > __setitem__?
>> >
>> >
>> > >
>> > > You can see a detailed discussion in the PEP at L913
>>
>> 
>>
>> > I guess the case this would disallow is distinguishing between
>> > obj[None,
>> > x=23] and obj[x=23]?
>> >
>> > Yes, this could be a little awkward potentially. The tuple would
>> > definitely
>> > be more natural for NumPy users, given the that first step of
>> > __getitem__/__setitem__ methods in the broader NumPy ecosystem is
>> > typically
>> > packing non-tuple keys into a tuple, e.g.,
>> >
>> > def __getitem__(self, key):
>> > if not isinstance(key, tuple):
>> > key = (key,)
>> > ...
>> >
>> > That said:
>> > - NumPy itself is unlikely to support keyword indexing anytime soon.
>> > - New packages could encourage using explicit aliases like
>> > "np.newaxis"
>> > instead of "None", which in general is a best practice already.
>> > - The combined use of keyword indexing *and* insertion of new axes at
>> > the
>> > same time strikes me as something that would be unusual in practice.
>> > From
>> > what I've seen, it is most useful to either use entirely unnamed or
>> > entirely named axes. In the later case, I might write something like
>> > obj[x=None] to indicate inserting a new dimension with the name "x".
>> >
>>
>> Just to briefly second these points and the general support thanks for
>> the hard work!  I do also wonder about the `key=custom_default`
>> solution, or whether there may be other option to address this.
>>
>> For NumPy, I do hope we can play with the idea of adding support if
>> this PEP lands. But agree that labeled axes in NumPy is unlikely to be
>> on the immediate horizon, and I am not sure how feasible it is.
>>
>> My main in the discussion on python-ideas was `arr[]` doing something
>> unexpected, but it was long decided that it will remain a SyntaxError.
>>
>> For the question at hand, it seems to me that mixing labeled and
>> unlabeled indexing would be an error for array-like objects.
>> In that case, the worst we get seems a quirk where `arr[None, x=4]` is
>> not an error, when it should be an error.
>> That does not really strike me as a blocker.
>>
>> I am not a fan of such quirks. But some trade-off seems unavoidable
>> considering the backward compatibility constraints and differences
>> between array-likes and typing use of `__getitem__`.
>>
>> - Sebastian
>>
>>
>> > I think we could definitely live with it either way. I would lean
>> > towards
>> > using an empty tuple, but I agree that it feels a little uglier than
>> > using
>> > None (though perhaps not surprising, given the already ugly special
>> > cases
>> > for tuples in the indexing protocol).
>> >
>> > Best,
>> > Stephan
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy

Re: [Numpy-discussion] number datetime64 dtypes

2020-09-10 Thread Eric Wieser
It's interesting to confirm that people are aware of this syntax!

This is intended but perhaps not useful behavior.

`datetime64[15D]` is a type that stores dates by the nearest date that is a
multiple of 15 days from the unix epoch.
Arguably there isn't a situation where using `15D` makes a whole lot of
sense, but the generalization is useful - `datetime64[15m]` stores dates
rounded to the nearest quarter hour, which is somewhat sensible.

Perhaps we should have added support for a custom epoch, which would make
your problem go away...

On Thu, 10 Sep 2020 at 18:43, Dr. Mark Alexander Mikofski PhD <
mikof...@berkeley.edu> wrote:

> Hi,
>
> Thank you for your time.
>
> A colleague asked me about creating a range of numpy datetime64 at 15-day
> increments.
>
> This works:
>
> np.arange(np.datetime64('2008-04-01'), np.datetime64('2020-09-01'), 
> np.timedelta64(15, 'D'))
>
> but then they also showed me this, which leads to some very strange
> responses:
>
> np.arange(np.datetime64('2008-04-01'), np.datetime64('2020-09-01'),
> dtype="datetime64[15D]")
> Out[50]:
> array(['2008-03-27', '2008-04-11', '2008-04-26', '2008-05-11',
>'2008-05-26', '2008-06-10', '2008-06-25', '2008-07-10',
> ...
>'2020-05-23', '2020-06-07', '2020-06-22', '2020-07-07',
>'2020-07-22', '2020-08-06'], dtype='datetime64[15D]')
>
> See how the 1st day is March 27th?
>
> I couldn't find a reference to this dtype ( "datetime64[15D]" ) in the
> numpy docs, but I think it's a common pattern in Pandas, that is using a
> number to get an increment of the frequency, for example "5T" is 5-minutes,
> etc.
>
> There is a reference to using arange with dtype on the datetimes &
> timedelta doc page () but the datetime is 1-day or  "datetime64[D]"
>
> Is this the intended outcome? Or is it a side effect?
>
> I wonder if others have tried to adapt Pandas patterns to Numpy datetimes,
> and if it's an issue for anyone else.
>
> I've advised my colleague not to use Numpy datetimes like this, assuming
> based on the docs that Pandas-style offsets do not translate into Numpy
> style datetimes.
>
> thanks!
>
> --
> Mark Mikofski, PhD (2005)
> *Fiat Lux*
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Feature Request] Add alias of np.concatenate as np.concat

2020-06-06 Thread Eric Wieser
I agree with all of Ralf's points, except for perhaps this one:

> I also don't see a reason to conform to TensorFlow (or PyTorch, or
Matlab, or whichever other library)

Python itself has a name for this function, `operator.concat` - so _maybe_
this sets a strong enough precedent for us to add an alias.

But we already diverge from the stardard library on things like
`np.remainder` vs `math.remainder` - so my feeling is this still isn't
worth it.

Eric

On Sat, Jun 6, 2020, 15:21 Ralf Gommers  wrote:

>
>
> On Sat, Jun 6, 2020 at 2:58 PM Adrin  wrote:
>
>> This somehow also reminds me of the `__array_module__` (NEP37) protocol.
>>
>> I'm not sure if TF would ever implement it, but it would be really nice
>> if the NEP37 proposal
>> would move forward and libraries would implement it.
>>
>
> There is a plan to move forward with the various proposals on the array
> protocol front:
> https://github.com/numpy/archive/blob/master/other_meetings/2020-04-21-array-protocols_discussion_and_notes.md
>
> At this point I think it needs work to implement and exercise the
> alternatives, rather than a decision.
>
>
>> On Mon, Jun 1, 2020 at 8:22 PM Iordanis Fostiropoulos <
>> danny.fostiropou...@gmail.com> wrote:
>>
>>> In regard to Feature Request:
>>> https://github.com/numpy/numpy/issues/16469
>>>
>>> It was suggested to sent to the mailing list. I think I can make a
>>> strong point as to why the support for this naming convention would make
>>> sense. Such as it would follow other frameworks that often work alongside
>>> numpy such as tensorflow. For backward compatibility, it can simply be an
>>> alias to np.concatenate
>>>
>>> I often convert portions of code from tf to np, it is as simple as
>>> changing the base module from tf to np. e.g. np.expand_dims ->
>>> tf.expand_dims. This is done either in debugging (e.g. converting tf to np
>>> without eager execution to debug portion of the code), or during
>>> prototyping, e.g. develop in numpy and convert in tf.
>>>
>>> I find myself more than at one occasion to getting syntax errors because
>>> of this particular function np.concatenate. It is unnecessarily long. I
>>> imagine there are more people that also run into the same problems. Pandas
>>> uses concat (torch on the other extreme uses simply cat, which I don't
>>> think is as descriptive).
>>>
>>
> I don't think this is a good idea. We have a lot of poor function and
> object names,
> adding aliases for those isn't a healthy idea. `concatenate` is a good,
> descriptive name.
> Adding an alias for it just gives two equivalent ways of calling the same
> functionality,
> puts an extra burden on other libraries that want to be numpy-compatible,
> puts an extra burden on users that now see two similar function names
> (e.g. with
> tab completion) that they then need to look up to decide which one to use,
> and generally sets a bad precedent.
>
> Saving five characters is not a good enough reason to add an alias.
>
> I also don't see a reason to conform to TensorFlow (or PyTorch, or Matlab,
> or
> whichever other library). If we're adding a new function then yes by all
> means
> look at prior art, but here we have 15 years of existing uses of a
> sensibly named
> function.
>
> Cheers,
> Ralf
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] log of negative real numbers -> RuntimeWarning: invalid value encountered in log

2020-05-25 Thread Eric Wieser
One explanation for this behavior is that doing otherwise would be slow.

Consider an array like

arr = np.array([1]*10**6 + [-1])
ret = np.log(arr)

Today, what happens is:

   - The output array is allocated as np.double
   - The input array is iterated over, and log evaluated on each element in
   turn

For what you describe to happen, the behavior would have to be either:

   - The output array is allocated as np.double
   -

   The input array is iterated over, and log evaluated on each element in
   turn
   -

   If any negative element is encountered, allocate a new array as
   np.cdouble, copy all the data over, then continue. This results in the
   whole array being promoted.

or:

   - The input array is iterated over, and checked to see if all the values
   are positive
   -

   The output array is allocated as np.double or np.cdouble based on this
   result
   -

   The input array is iterated over, and log evaluated on each element in
   turn

In either case, you’ve converted a 1-pass iteration to a 2-pass one.

There are static-typing-based explanations for this behavior too, but I’ll
let someone else present one of those.

Eric

On Mon, 25 May 2020 at 14:33, Brian Racey  wrote:

> Why does numpy produce a runtime warning (invalid value encountered in
> log) when taking the log of a negative number? I noticed that if you coerce
> the argument to complex by adding 0j to the negative number, the expected
> result is produced (i.e. ln(-1) = pi*i).
>
> I was surprised I couldn't find a discussion on this, as I would have
> expected others to have come across this before. Packages like Matlab
> handle negative numbers automatically by doing the complex conversion.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using nditer + external_loop to Always Iterate by Column

2020-05-19 Thread Eric Wieser
Hi Will,

To force an iteration to run along certain axes, I believe you should be
using `op_axes`. Your diagnosis is correct that `external_loop` is trying
to help you be more optimal, since it's purpose is exactly that:
optimization.

Unfortunately, if you use `op_axes` you'll run into
https://github.com/numpy/numpy/issues/9808.

Eric

On Tue, 19 May 2020 at 00:42, William Ayd  wrote:

> I am trying to use the nditer to traverse each column of a 2D array,
> returning the column as a 1D array. Consulting the docs, I found this
> example which works perfectly fine:
>
> In [*65*]: a = np.arange(6).reshape(2,3)
>
>
>
> In [*66*]: *for* x *in* np.nditer(a, flags=['external_loop'], order='F'):
> ...: print(x, end=' ')
> ...:
>
>
> [0 3] [1 4] [2 5]
>
> When changing the shape of the input array to (1, 3) however, this doesn’t
> yield what I am hoping for any more (essentially [0], [1] [2]):
>
> In [*68*]: *for* x *in* np.nditer(a, flags=['external_loop'], order='F'):
> ...: print(x, end=' ')
> ...:
>
>
> [0 1 2]
>
> I suspect this may have to do with the fact that the (1, 3) array is both
> C and F contiguous, and it is trying to return as large of a 1D
> F-contiguous array as it can. However, I didn’t see any way to really force
> it to go by columns. My best guess was the *itershape* argument though I
> couldn’t figure out how to get that to work and didn’t see much in the
> documentation.
>
> Thanks in advance for the help!
>
> - Will
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecate Promotion of numbers to strings?

2020-04-30 Thread Eric Wieser
> Another larger visible change will be code such as:
>
> np.concatenate([np.array(["string"]), np.array([2])])
>
> will result in an error instead of returning a string array. (Users
> will have to cast manually here.)

I wonder if we can lessen the blow by allowing
`np.concatenate([np.array(["string"]), np.array([2])], casting='unsafe',
dtype=str)` or similar in its place.
It seems a little unfortunate that with this change, we lose the ability to
concatenate numbers to strings without making intermediate copies.

Eric



On Thu, 30 Apr 2020 at 18:32, Sebastian Berg 
wrote:

> Hi all,
>
> in https://github.com/numpy/numpy/pull/15925 I propose to deprecate
> promotion of strings and numbers. I have to double check whether this
> has a large effect on pandas, but it currently seems to me that it will
> be reasonable.
>
> This means that `np.promote_types("S", "int8")`, etc. will lead to an
> error instead of returning `"S4"`.  For the user, I believe the two
> main visible changes are that:
>
> np.array(["string", 0])
>
> will stop creating a string array and return either an `object` array
> or give an error (object array would be the default currently).
>
> Another larger visible change will be code such as:
>
> np.concatenate([np.array(["string"]), np.array([2])])
>
> will result in an error instead of returning a string array. (Users
> will have to cast manually here.)
>
> The alternative is to return an object array also for the concatenate
> example.  I somewhat dislike that because `object` is not homogeneously
> typed and we thus lose type information.  This also affects functions
> that wish to cast inputs to a common type (ufuncs also do this
> sometimes).
> A further example of this and discussion is at the end of the mail [1].
>
>
> So the first question is whether we can form an agreement that an error
> is the better choice for `concatenate` and `np.promote_types()`.
> I.e. there is no one dtype that can faithfully represent both strings
> and integers. (This is currently the case e.g. for datetime64 and
> float64.)
>
>
> The second question is what to do for:
>
> np.array(["string", 0])
>
> which currently always returns strings.  Arguably, it must also either
> return an `object` array, or raise an error (requiring the user to pick
> string or object using `dtype=object`).
>
> The default would be to create a FutureWarning that an `object` array
> will be returned for `np.asarray(["string", 0])` in the future.
> But if we know already that we prefer an error, it would be better to
> give a DeprecationWarning right away. (It just does not seem nice to
> change the same thing twice even if the workaround is identical.)
>
> Cheers,
>
> Sebastian
>
>
> [1]
>
> A second more in-depth point is that code such as:
>
> common_dtype = np.result_type(arr1, arr2)  # or promote_types
> arr1 = arr1.astype(common_dtype, copy=False)
> arr2 = arr2.astype(common_dtype, copy=False)
>
> will currently use `string` in this case while it would error in the
> future. This already fails with other type combinations such as
> `datetime64` and `float64` at the moment.
>
> The main alternative to this proposal is to return `object` for the
> common dtype, since an object array is not homogeneously typed, it
> arguably can represent both inputs.  I do not quite like this choice
> personally because in the above example, it may be that the next line
> is something like:
>
> return arr1 * arr2
>
> in which case, the preferred return may be `str` and not `object`.
> We currently never promote to `object` unless one of the arrays is
> already an `object` array, and that seems like the right choice to me.
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

2020-04-24 Thread Eric Wieser
Perhaps worth mentioning that we've discussed this sort of API before, in
https://github.com/numpy/numpy/pull/11897.

Under that proposal, the api would be something like:

* `copy=True` - always copy, like it is today
* `copy=False` - copy if needed, like it is today
* `copy=np.never_copy` - never copy, throw an exception if not possible

I think the discussion stalled on the precise spelling of the third option.

`__array__` was not discussed there, but it seems like adding the `copy`
argument to `__array__` would be a perfectly reasonable extension.

Eric

On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias  wrote:

> Hi everyone,
>
> One bit of expressivity we would miss is “copy if necessary, but otherwise
>> don’t bother”, but there are workarounds to this.
>>
>
> After a side discussion with Stéfan van der Walt, we came up with
> `allow_copy=True`, which would express to the downstream library that we
> don’t mind waiting, but that zero-copy would also be ok.
>
> This sounds like the sort of thing that is use case driven. If enough
> projects want to use it, then I have no objections to adding the keyword.
> OTOH, we need to be careful about adding too many interoperability tricks
> as they complicate the code and makes it hard for folks to determine the
> best solution. Interoperability is a hot topic and we need to be careful
> not put too leave behind too many experiments in the NumPy code.  Do you
> have any other ideas of how to achieve the same effect?
>
>
> Personally, I don’t have any other ideas, but would be happy to hear some!
>
> My view regarding API/experiment creep is that `__array__` is the oldest
> and most basic of all the interop tricks and that this can be safely
> maintained for future generations. Currently it only takes `dtype=` as a
> keyword argument, so it is a very lean API. I think this particular use
> case is very natural and I’ve encountered the reluctance to implicitly copy
> twice, so I expect it is reasonably common.
>
> Regarding difficulty in determining the best solution, I would be happy to
> contribute to the dispatch basics guide together with the new kwarg. I
> agree that the protocols are getting quite numerous and I couldn’t find a
> single place that gathers all the best practices together. But, to
> reiterate my point: `__array__` is the simplest of these and I think this
> keyword is pretty safe to add.
>
> For ease of discussion, here are the API options discussed so far, as well
> as a few extra that I don’t like but might trigger other ideas:
>
> np.asarray(my_duck_array, allow_copy=True)  # default is False, or None ->
> leave it to the duck array to decide
> np.asarray(my_duck_array, copy=True)  # always copies, but, if supported
> by the duck array, defers to it for the copy
> np.asarray(my_duck_array, copy=‘allow’)  # could take values ‘allow’,
> ‘force’, ’no’, True(=‘force’), False(=’no’)
> np.asarray(my_duck_array, force_copy=False, allow_copy=True)  # separate
> concepts, but unclear what force_copy=True, allow_copy=False means!
> np.asarray(my_duck_array, force=True)
>
> Juan.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is `numpy.lib.shape_base.normalize_axis_index` considered part of the public API?

2020-04-06 Thread Eric Wieser
When I added this function, it was always my intent for it to be consumed
by downstream packages, but as Sebastian remarks, it wasn't really
desirable to put it in the top-level namespace.

I think I would be reasonably happy to make the guarantee that it would not
be removed (or more likely, moved) without a lengthy deprecation cycle.

Perhaps worth opening a github issue, so we can keep track of how many
downstream projects are already using it.

Eric

On Sun, 5 Apr 2020 at 15:06, Sebastian Berg 
wrote:

> On Sun, 2020-04-05 at 00:43 -0400, Warren Weckesser wrote:
> > On 4/4/20, Warren Weckesser  wrote:
> > > It would be handy if in scipy we can use the function
> > > `numpy.lib.shape_base.normalize_axis_index` as a consistent method
> > > for
> > > validating an `axis` argument.  Is this function considered part of
> > > the public API?
> > >
> > > There are modules in numpy that do not have leading underscores but
> > > are still usually considered private.  I'm not sure if
> > > `numpy.lib.shape_base` is one of those.  `normalize_axis_index` is
> > > not
> > > in the top-level `numpy` namespace, and it is not included in the
> > > API
> > > reference
> > > (
> > >
> https://numpy.org/devdocs/search.html?q=normalize_axis_index&check_keywords=yes&area=default
> > > ),
> > > so I'm not sure if we can safely consider this function to be
> > > public.
> > >
>
> I do not see a reason why we should not make those functions public.
> The only thing I see is that they are maybe not really required in the
> main namespace, i.e. you can be expected to use::
>
> from numpy.something import normalize_axis_tuple
>
> I think, since this is a function for library authors more than end-
> users. And we do not have much prior art around where to put something
> like that.
>
> Cheers,
>
> Sebastian
>
>
>
> > > Warren
> > >
> >
> > Answering my own question:
> >
> > "shape_base.py" is not where `normalize_axis_index` is originally
> > defined, so that module can be ignored.
> >
> > The function is actually defined in `numpy.core.multiarray`.  The
> > pull
> > request in which the function was created is
> > https://github.com/numpy/numpy/pull/8584. Whether or not the function
> > was to be public is discussed starting here:
> > https://github.com/numpy/numpy/pull/8584#issuecomment-281179399.  A
> > leading underscore was discussed and intentionally not added to the
> > function.  On the other hand, it was not added to the top-level
> > namespace, and Eric Wieser wrote "Right now, it is only accessible
> > via
> > np.core.multiarray.normalize_axis_index, so yes, an internal
> > function".
> >
> > There is another potentially useful function, `normalize_axis_tuple`,
> > defined in `numpy.core.numeric`.  This function is also not in the
> > top-level numpy namespace.
> >
> > So it looks like neither of these functions is currently intended to
> > be public. For the moment, I think we'll create our own utility
> > functions in scipy.  We can switch to using the numpy functions if
> > those functions are ever intentionally made public.
> >
> > Warren
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Deprecating ndarray.tostring()

2020-03-30 Thread Eric Wieser
Hi all,

Just a heads up that in https://github.com/numpy/numpy/pull/15867 I plan to
deprecate ndarray.tostring(), which is just a confusing way to spell
ndarray.tobytes().

This function has been documented as a compatibility alias since NumPy 1.9,
but never emitted a warning upon use.
Given array.array.tostring() has been warning as far back as Python 3.1, it
seems like we should follow suit.

In order to reduce the impact of such a deprecation, I’ve filed the
necessary scipy PR: https://github.com/scipy/scipy/pull/11755.
It’s unlikely we’ll remove this function entirely any time soon, but the
act of deprecating it may cause a few failing CI runs in projects where
warnings are turned into errors.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Put type annotations in NumPy proper?

2020-03-24 Thread Eric Wieser
>  Putting
> aside ndarray, as more challenging, even annotations for numpy functions
> and method parameters with built-in types would help, as a start.

This is a good idea in principle, but one thing concerns me.

If we add type annotations to numpy, does it become an error to have
numpy-stubs installed?
That is, is this an all-or-nothing thing where as soon as we start,
numpy-stubs becomes unusable?

Eric

On Tue, 24 Mar 2020 at 17:28, Roman Yurchak  wrote:

> Thanks for re-starting this discussion, Stephan! I think there is
> definitely significant interest in this topic:
> https://github.com/numpy/numpy/issues/7370 is the issue with the largest
> number of user likes in the issue tracker (FWIW).
>
> Having them in numpy, as opposed to a separate numpy-stubs repository
> would indeed be ideal from a user perspective. When looking into it in
> the past, I was never sure how well in sync numpy-stubs was. Putting
> aside ndarray, as more challenging, even annotations for numpy functions
> and method parameters with built-in types would help, as a start.
>
> To add to the previously listed projects that would benefit from this,
> we are currently considering to start using some (minimal) type
> annotations in scikit-learn.
>
> --
> Roman Yurchak
>
> On 24/03/2020 18:00, Stephan Hoyer wrote:
> > When we started numpy-stubs [1] a few years ago, putting type
> > annotations in NumPy itself seemed premature. We still supported Python
> > 2, which meant that we would need to use awkward comments for type
> > annotations.
> >
> > Over the past few years, using type annotations has become increasingly
> > popular, even in the scientific Python stack. For example, off-hand I
> > know that at least SciPy, pandas and xarray have at least part of their
> > APIs type annotated. Even without annotations for shapes or dtypes, it
> > would be valuable to have near complete annotations for NumPy, the
> > project at the bottom of the scientific stack.
> >
> > Unfortunately, numpy-stubs never really took off. I can think of a few
> > reasons for that:
> > 1. Missing high level guidance on how to write type annotations,
> > particularly for how (or if) to annotate particularly dynamic parts of
> > NumPy (e.g., consider __array_function__), and whether we should
> > prioritize strictness or faithfulness [2].
> > 2. We didn't have a good experience for new contributors. Due to the
> > relatively low level of interest in the project, when a contributor
> > would occasionally drop in, I often didn't even notice their PR for a
> > few weeks.
> > 3. Developing type annotations separately from the main codebase makes
> > them a little harder to keep in sync. This means that type annotations
> > couldn't serve their typical purpose of self-documenting code. Part of
> > this may be necessary for NumPy (due to our use of C extensions), but
> > large parts of NumPy's user facing APIs are written in Python. We no
> > longer support Python 2, so at least we no longer need to worry about
> > putting annotations in comments.
> >
> > We eventually could probably use a formal NEP (or several) on how we
> > want to use type annotations in NumPy, but I think a good first step
> > would be to think about how to start moving the annotations from
> > numpy-stubs into numpy proper.
> >
> > Any thoughts? Anyone interested in taking the lead on this?
> >
> > Cheers,
> > Stephan
> >
> > [1] https://github.com/numpy/numpy-stubs
> > [2] https://github.com/numpy/numpy-stubs/issues/12
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

2020-02-05 Thread Eric Wieser
>  scipy.linalg is a superset of numpy.linalg

This isn't completely accurate - numpy.linalg supports almost all
operations* over stacks of matrices via gufuncs, but scipy.linalg does not
appear to.

Eric

*: not lstsq due to an ungeneralizable public API

On Wed, 5 Feb 2020 at 17:38, Ralf Gommers  wrote:

>
>
> On Wed, Feb 5, 2020 at 10:01 AM Andreas Mueller  wrote:
>
>> A bit late to the NEP 37 party.
>> I just wanted to say that at least from my perspective it seems a great
>> solution that will help sklearn move towards more flexible compute engines.
>> I think one of the biggest issues is array creation (including random
>> arrays), and that's handled quite nicely with NEP 37.
>>
>> There's some discussion on the scikit-learn side here:
>> https://github.com/scikit-learn/scikit-learn/pull/14963
>> https://github.com/scikit-learn/scikit-learn/issues/11447
>>
>> Two different groups of people tried to use __array_function__ to
>> delegate to MxNet and CuPy respectively in scikit-learn, and ran into the
>> same issues.
>>
>> There's some remaining issues in sklearn that will not be handled by NEP
>> 37 but they go beyond NumPy in some sense.
>> Just to briefly bring them up:
>>
>> - We use scipy.linalg in many places, and we would need to do a separate
>> dispatching to check whether we can use module.linalg instead
>>  (that might be an issue for many libraries but I'm not sure).
>>
>
> That is an issue, and goes in the opposite direction we need -
> scipy.linalg is a superset of numpy.linalg, so we'd like to encourage using
> scipy. This is something we may want to consider fixing by making the
> dispatch decorator public in numpy and adopting in scipy.
>
> Cheers,
> Ralf
>
>
>
>>
>> - Some models have several possible optimization algorithms, some of
>> which are pure numpy and some which are Cython. If someone provides a
>> different array module,
>>  we might want to choose an algorithm that is actually supported by that
>> module. While this exact issue is maybe sklearn specific, a similar issue
>> could appear for most downstream libs that use Cython in some places.
>>  Many Cython algorithms could be implemented in pure numpy with a
>> potential slowdown, but once we have NEP 37 there might be a benefit to
>> having a pure NumPy implementation as an alternative code path.
>>
>>
>> Anyway, NEP 37 seems a great step in the right direction and would enable
>> sklearn to actually dispatch in some places. Dispatching just based on
>> __array_function__ seems not really feasible so far.
>>
>> Best,
>> Andreas Mueller
>>
>>
>> On 1/6/20 11:29 PM, Stephan Hoyer wrote:
>>
>> I am pleased to present a new NumPy Enhancement Proposal for discussion:
>> "NEP-37: A dispatch protocol for NumPy-like modules." Feedback would be
>> very welcome!
>>
>> The full text follows. The rendered proposal can also be found online at
>> https://numpy.org/neps/nep-0037-array-module.html
>>
>> Best,
>> Stephan Hoyer
>>
>> ===
>> NEP 37 — A dispatch protocol for NumPy-like modules
>> ===
>>
>> :Author: Stephan Hoyer 
>> :Author: Hameer Abbasi
>> :Author: Sebastian Berg
>> :Status: Draft
>> :Type: Standards Track
>> :Created: 2019-12-29
>>
>> Abstract
>> 
>>
>> NEP-18's ``__array_function__`` has been a mixed success. Some projects
>> (e.g.,
>> dask, CuPy, xarray, sparse, Pint) have enthusiastically adopted it. Others
>> (e.g., PyTorch, JAX, SciPy) have been more reluctant. Here we propose a
>> new
>> protocol, ``__array_module__``, that we expect could eventually subsume
>> most
>> use-cases for ``__array_function__``. The protocol requires explicit
>> adoption
>> by both users and library authors, which ensures backwards compatibility,
>> and
>> is also significantly simpler than ``__array_function__``, both of which
>> we
>> expect will make it easier to adopt.
>>
>> Why ``__array_function__`` hasn't been enough
>> -
>>
>> There are two broad ways in which NEP-18 has fallen short of its goals:
>>
>> 1. **Maintainability concerns**. `__array_function__` has significant
>>implications for libraries that use it:
>>
>>- Projects like `PyTorch
>>  `_, `JAX
>>  `_ and even `scipy.sparse
>>  `_ have been reluctant
>> to
>>  implement `__array_function__` in part because they are concerned
>> about
>>  **breaking existing code**: users expect NumPy functions like
>>  ``np.concatenate`` to return NumPy arrays. This is a fundamental
>>  limitation of the ``__array_function__`` design, which we chose to
>> allow
>>  overriding the existing ``numpy`` namespace.
>>- ``__array_function__`` currently requires an "all or nothing"
>> approach to
>>  implementing NumPy's API. There is no good pathway for **incrementa

Re: [Numpy-discussion] Adding an nd generalization of np.ma.mask_rowscols

2020-01-17 Thread Eric Wieser
IMHO, masked arrays and extending masks like that is a weird API.

To give some context, I needed this nd generalization internally in order
to fix the issue on these lines
<https://github.com/numpy/numpy/blob/ccfbcc1cd9a4035a467f2e982a565ab27de25b6b/numpy/ma/core.py#L7464-L7468>
.

I would prefer a more functional approach

Can you elaborate on that with an example of the API you’d prefer instead
of np.ma.mask_extend_axis(a, axis=(0, 1))?

On Fri, 17 Jan 2020 at 15:30, Hameer Abbasi 
wrote:

> IMHO, masked arrays and extending masks like that is a weird API. I would
> prefer a more functional approach: Where we take in an input 1-D or N-D
> boolean array in addition to a masked array with multiple axes over which
> to extend the mask.
>
>
>
> *From: *NumPy-Discussion  gmail@python.org> on behalf of Eric Wieser <
> wieser.eric+nu...@gmail.com>
> *Reply to: *Discussion of Numerical Python 
> *Date: *Friday, 17. January 2020 at 11:40
> *To: *Discussion of Numerical Python 
> *Subject: *[Numpy-discussion] Adding an nd generalization of
> np.ma.mask_rowscols
>
>
>
> Today, numpy has a np.ma.mask_rowcols function, which stretches masks
> along
> the full length of an axis. For example, given the matrix::
>
> >>> a2d = np.zeros((3, 3), dtype=int)
>
> >>> a2d[1, 1] = 1
>
> >>> a2d = np.ma.masked_equal(a2d, 1)
>
> >>> print(a2d)
>
> [[0 0 0]
>
>  [0 -- 0]
>
>  [0 0 0]]
>
> The API allows::
>
> >>> print(np.ma.mask_rowcols(a2d, axis=0))
>
> [[0 0 0]
>
>  [-- -- --]
>
>  [0 0 0]]
>
>
>
> >>> print(np.ma.mask_rowcols(a2d, axis=1))
>
> [[0 -- 0]
>
>  [0 -- 0]
>
>  [0 -- 0]]
>
>
>
> >>> print(np.ma.mask_rowcols(a2d, axis=None))
>
> [[0 -- 0]
>
>  [-- -- --]
>
>  [0 -- 0]]
>
> However, this function only works for 2D arrays.
> It would be useful to generalize this to work on ND arrays as well.
>
> Unfortunately, the current function is messy to generalize, because axis=0
> means “spread the mask along axis 1”, and vice versa. Additionally, the
> name is not particularly good for an ND function.
>
> My proposal in PR 14998 <https://github.com/numpy/numpy/pull/14998> is to
> introduce a new function, mask_extend_axis, which fixes this shortcoming.
> Given an 3D array::
>
> >>> a3d = np.zeros((2, 2, 2), dtype=int)
>
> >>> a3d[0, 0, 0] = 1
>
> >>> a3d = np.ma.masked_equal(a3d, 1)
>
> >>> print(a3d)
>
> [[[-- 0]
>
>   [0 0]]
>
>
>
>  [[0 0]
>
>   [0 0]]]
>
> This, in my opinion, has clearer axis semantics:
>
> >>> print(np.ma.mask_extend_axis(a2d, axis=0))
>
> [[[-- 0]
>
>   [0 0]]
>
>
>
>  [[-- 0]
>
>   [0 0]]]
>
>
>
> >>> print(np.ma.mask_extend_axis(a2d, axis=1))
>
> [[[-- 0]
>
>   [-- 0]]
>
>
>
>  [[0 0]
>
>   [0 0]]]
>
>
>
> >>> print(np.ma.mask_extend_axis(a2d, axis=2))
>
> [[[-- --]
>
>   [0 0]]
>
>
>
>  [[0 0]
>
>   [0 0]]]
>
> Stretching over multiple axes remains possible:
>
> >>> print(np.ma.mask_extend_axis(a2d, axis=(1, 2)))
>
> [[[-- --]
>
>   [-- 0]]
>
>
>
>  [[0 0]
>
>   [0 0]]]
>
>
>
> # extending sequentially is not the same as extending in parallel
>
> >>> print(np.ma.mask_extend_axis(np.ma.mask_extend_axis(a2d, axis=1), axis=2))
>
> [[[-- --]
>
>   [-- --]]
>
>
>
>  [[0 0]
>
>   [0 0]]]
>
> Questions for the mailing list then:
>
> · Can you think of a better name than mask_extend_axis?
>
> · Does my proposed meaning of axis make more sense to you than
> the one used by mask_rowcols?
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Adding an nd generalization of np.ma.mask_rowscols

2020-01-17 Thread Eric Wieser
Today, numpy has a np.ma.mask_rowcols function, which stretches masks along
the full length of an axis. For example, given the matrix::

>>> a2d = np.zeros((3, 3), dtype=int)
>>> a2d[1, 1] = 1
>>> a2d = np.ma.masked_equal(a2d, 1)
>>> print(a2d)
[[0 0 0]
 [0 -- 0]
 [0 0 0]]

The API allows::

>>> print(np.ma.mask_rowcols(a2d, axis=0))
[[0 0 0]
 [-- -- --]
 [0 0 0]]

>>> print(np.ma.mask_rowcols(a2d, axis=1))
[[0 -- 0]
 [0 -- 0]
 [0 -- 0]]

>>> print(np.ma.mask_rowcols(a2d, axis=None))
[[0 -- 0]
 [-- -- --]
 [0 -- 0]]

However, this function only works for 2D arrays.
It would be useful to generalize this to work on ND arrays as well.

Unfortunately, the current function is messy to generalize, because axis=0
means “spread the mask along axis 1”, and vice versa. Additionally, the
name is not particularly good for an ND function.

My proposal in PR 14998  is to
introduce a new function, mask_extend_axis, which fixes this shortcoming.
Given an 3D array::

>>> a3d = np.zeros((2, 2, 2), dtype=int)
>>> a3d[0, 0, 0] = 1
>>> a3d = np.ma.masked_equal(a3d, 1)
>>> print(a3d)
[[[-- 0]
  [0 0]]

 [[0 0]
  [0 0]]]

This, in my opinion, has clearer axis semantics:

>>> print(np.ma.mask_extend_axis(a2d, axis=0))
[[[-- 0]
  [0 0]]

 [[-- 0]
  [0 0]]]

>>> print(np.ma.mask_extend_axis(a2d, axis=1))
[[[-- 0]
  [-- 0]]

 [[0 0]
  [0 0]]]

>>> print(np.ma.mask_extend_axis(a2d, axis=2))
[[[-- --]
  [0 0]]

 [[0 0]
  [0 0]]]

Stretching over multiple axes remains possible:

>>> print(np.ma.mask_extend_axis(a2d, axis=(1, 2)))
[[[-- --]
  [-- 0]]

 [[0 0]
  [0 0]]]

# extending sequentially is not the same as extending in parallel
>>> print(np.ma.mask_extend_axis(np.ma.mask_extend_axis(a2d, axis=1), axis=2))
[[[-- --]
  [-- --]]

 [[0 0]
  [0 0]]]

Questions for the mailing list then:

   - Can you think of a better name than mask_extend_axis?
   - Does my proposed meaning of axis make more sense to you than the one
   used by mask_rowcols?
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argmax() indexes to value

2019-11-01 Thread Eric Wieser
> On my system plain fancy indexing is fastest

Hardly surprising, since take_along_axis is doing that under the hood,
after constructing the index for you :)

https://github.com/numpy/numpy/blob/v1.17.0/numpy/lib/shape_base.py#L58-L172

I deliberately didn't expose the internal function that constructs the
slice, since leaving it private frees us to move those functions to c or in
the distant future gufuncs.


On Fri, Nov 1, 2019, 15:54 Allan Haldane  wrote:

> my thought was to try `take` or `take_along_axis`:
>
>ind = np.argmin(a, axis=1)
>np.take_along_axis(a, ind[:,None], axis=1)
>
> But those functions tend to simply fall back to fancy indexing, and are
> pretty slow. On my system plain fancy indexing is fastest:
>
> >>> %timeit a[np.arange(N),ind]
> 1.58 µs ± 18.1 ns per loop
> >>> %timeit np.take_along_axis(a, ind[:,None], axis=1)
> 6.49 µs ± 57.3 ns per loop
> >>> %timeit np.min(a, axis=1)
> 9.51 µs ± 64.1 ns per loop
>
> Probably `take_along_axis` was designed with uses like yours in mind,
> but it is not very optimized.
>
> (I think numpy is lacking a category of efficient
> indexing/search/reduction functions, like 'findfirst', 'groupby',
> short-circuiting any_*/all_*/nonzero, the proposed oindex/vindex, better
> gufunc broadcasting. There is slow but gradual infrastructure work
> towards these, potentially).
>
> Cheers,
> Allan
>
> On 10/30/19 11:31 PM, Daniele Nicolodi wrote:
> > On 30/10/2019 19:10, Neal Becker wrote:
> >> max(axis=1)?
> >
> > Hi Neal,
> >
> > I should have been more precise in stating the problem. Getting the
> > values in the array for which I'm looking at the maxima is only one step
> > in a more complex piece of code for which I need the indexes along the
> > second axis of the array. I would like to avoid to have to iterate the
> > array more than once.
> >
> > Thank you!
> >
> > Cheers,
> > Dan
> >
> >
> >> On Wed, Oct 30, 2019, 7:33 PM Daniele Nicolodi  >> > wrote:
> >>
> >> Hello,
> >>
> >> this is a very basic question, but I cannot find a satisfying
> answer.
> >> Assume a is a 2D array and that I get the index of the maximum value
> >> along the second dimension:
> >>
> >> i = a.argmax(axis=1)
> >>
> >> Is there a better way to get the value of the maximum array entries
> >> along the second axis other than:
> >>
> >> v = a[np.arange(len(a)), i]
> >>
> >> ??
> >>
> >> Thank you.
> >>
> >> Cheers,
> >> Daniele
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@python.org 
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >>
> >>
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@python.org
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >>
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Problem with np.savetxt

2019-10-10 Thread Eric Wieser
You're trying to read a file with a name of literally `${d}.log`, which is
unlikely to be the name of your file. `${}` is bash syntax, not python
syntax.

This has drifted out of numpy territory and into "how to coordinate between
bash and python" territory - I'd perhaps recommend you ask this to a wider
python audience on StackOverflow, where you'll get a faster response.

Eric

On Thu, 10 Oct 2019 at 15:11, Stephen P. Molnar 
wrote:

> I am slowly and not quickly stumbling forward, but at this point my
> degree of mental entropy (confusion) is monumental.
>
> This works:
>
> > import numpy as np
> >
> > print('${d}')
> >
> > data = np.genfromtxt("14-7.log", usecols=(1), skip_header=27,
> > skip_footer=1, encoding=None)
> >
> > print(data)
> >
> > np.savetxt('14-7.dG', data, fmt='%12.9f', header='14-7')
> > print(data)
>
> which produces:
>
> > runfile('/home/comp/Apps/Python/PsoVina/DeltaGTable_V_s.py',
> > wdir='/home/comp/Apps/Python/PsoVina', current_namespace=True)
> > ${d}
> > [-9.96090267 -8.97950478 -8.94261136 -8.91552301 -8.73650883 -8.66338714
> >  -8.41073971 -8.38914635 -8.29679891 -8.16845411 -8.12799082 -8.12710377
> >  -7.97909074 -7.94187268 -7.90076621 -7.88148523 -7.83782648 -7.8159095
> >  -7.72254029 -7.72034674]
> > [-9.96090267 -8.97950478 -8.94261136 -8.91552301 -8.73650883 -8.66338714
> >  -8.41073971 -8.38914635 -8.29679891 -8.16845411 -8.12799082 -8.12710377
> >  -7.97909074 -7.94187268 -7.90076621 -7.88148523 -7.83782648 -7.8159095
> >  -7.72254029 -7.72034674]
> Note;  the print statements are for a quick check o the output, which is:
>
> > # 14-7
> > -9.960902669
> > -8.979504781
> > -8.942611364
> > -8.915523010
> > -8.736508831
> > -8.663387139
> > -8.410739711
> > -8.389146347
> > -8.296798909
> > -8.168454106
> > -8.127990818
> > -8.127103774
> > -7.979090739
> > -7.941872682
> > -7.900766215
> > -7.881485228
> > -7.837826485
> > -7.815909505
> > -7.722540286
> > -7.720346742
>   Also, this bash script works:
>
> > #!/bin/bash
> >
> > # Run.dG.list_1
> >
> > while IFS= read -r d
> > do
> > echo "${d}.log"
> >
> > done  which returns the three log file names:
>
> > 14-7.log
> > 15-7.log
> > 18-7.log
> > C-VX3.log
>
>
> But, if I run this bash script:
>
> > #!/bin/bash
> >
> > # Run.dG.list_1
> >
> > while IFS= read -r d
> > do
> > echo "${d}.log"
> > python3 DeltaGTable_V_sl.py
> >
> >
> > done  >
> where DeltaGTable_V_sl.py is:
>
> > import numpy as np
> >
> > print('${d}')
> >
> > data = np.genfromtxt('${d}.log', usecols=(1), skip_header=27,
> > skip_footer=1, encoding=None)
> > print(data)
> >
> > np.savetxt('${d}.dG', data, fmt='%12.9f', header='${d}')
> > print(data.dG)
>
> I get:
>
> > (base) comp@AbNormal:~/Apps/Python/PsoVina$ sh ./Run.dG.list_1.sh
> > 14-7.log
> > python3: can't open file 'DeltaGTable_V_sl.py': [Errno 2] No such file
> > or directory
> > 15-7.log
> > python3: can't open file 'DeltaGTable_V_sl.py': [Errno 2] No such file
> > or directory
> > 18-7.log
> > python3: can't open file 'DeltaGTable_V_sl.py': [Errno 2] No such file
> > or directory
> > C-VX3.log
> > python3: can't open file 'DeltaGTable_V_sl.py': [Errno 2] No such file
> > or directory
>
> So, it would appear that the log file labels are in the workspace, but
> '${d}.log' is not being recognized as fname by genfromtxt. Although i
> have googled every combination of terms I can think of I am obviously
> missing something.
>
> As I have potentially hundreds of files to process, I would appreciate
> pointers towards a solution to the problem.
>
> Thanks in advance.
>
> On 10/08/2019 10:49 AM, Stephen P. Molnar wrote:
> > Many thanks or your kind replies.
> >
> > I really appreciate your suggestions.
> >
> > On 10/08/2019 09:44 AM, Andras Deak wrote:
> >> PS. if you just want to specify the width of the fields you wouldn't
> >> have to convert anything, because you can specify the size and
> >> justification of a %s format. But arguably having float data as floats
> >> is more natural anyway.
> >>
> >> On Tue, Oct 8, 2019 at 3:42 PM Andras Deak 
> >> wrote:
> >>> On Tue, Oct 8, 2019 at 3:17 PM Stephen P. Molnar
> >>>  wrote:
>  I am embarrassed to be asking this question, but I have exhausted
>  Google
>  at this point .
> 
>  I have a number of identically formatted text files from which I
>  want to
>  extract data, as an example (hopefully, putting these in as quotes
>  will
>  persevere the format):
> 
> >
> ===
> >
> > PSOVina version 2.0
> > Giotto H. K. Tai & Shirley W. I. Siu
> >
> > Computational Biology and Bioinformatics Lab
> > University of Macau
> >
> > Visit http://cbbio.cis.umac.mo for more information.
> >
> > PSOVina was developed based on the framework of AutoDock Vina.
> >
> > For more information about Vina, please visit
> > http://vina.scripps.edu.
> >
> >
> =

Re: [Numpy-discussion] Forcing gufunc to error with size zero input

2019-09-28 Thread Eric Wieser
Can you just raise an exception in the gufuncs inner loop? Or is there no
mechanism to do that today?

I don't think you were proposing that core dimensions should _never_ be
allowed to be 0, but if you were I disagree. I spent a fair amount of work
enabling that for linalg because it provided some convenient base cases.

We could go down the route of augmenting the gufuncs signature syntax to
support requiring non-empty dimensions, like we did for optional ones -
although IMO we should consider switching from a string minilanguage to a
structured object specification if we plan to go too much further with
extending it.

On Sat, Sep 28, 2019, 17:47 Warren Weckesser 
wrote:

> I'm experimenting with gufuncs, and I just created a simple one with
> signature '(i)->()'.  Is there a way to configure the gufunc itself so
> that an empty array results in an error?  Or would I have to create a
> Python wrapper around the gufunc that does the error checking?
> Currently, when passed an empty array, the ufunc loop is called with
> the core dimension associated with i set to 0.  It would be nice if
> the code didn't get that far, and the ufunc machinery "knew" that this
> gufunc didn't accept a core dimension that is 0.  I'd like to
> automatically get an error, something like the error produced by
> `np.max([])`.
>
> Warren
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

2019-09-09 Thread Eric Wieser
> In other words `np.arange(100)` (but
with a completely different syntax, probably hidden away only for
libraries to use).

It sounds an bit like you're describing factory classmethods there. Is the
solution to this problem to move (leaving behind aliases) `np.arange` to
`ndarray.arange`, `np.zeros` to `ndarray.zeros`, etc - callers then would
use `type(duckarray).zeros` if they're trying to generalize.

Eric

On Mon, Sep 9, 2019, 21:18 Sebastian Berg 
wrote:

> On Mon, 2019-09-09 at 20:32 -0700, Stephan Hoyer wrote:
> > On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers 
> > wrote:
> > > I think we've chosen to try the former - dispatch on functions so
> > > we can reuse the NumPy API. It could work out well, it could give
> > > some long-term maintenance issues, time will tell. The question is
> > > now if and how to plug the gap that __array_function__ left. It's
> > > main limitation is "doesn't work for functions that don't have an
> > > array-like input" - that left out ~10-20% of functions. So now we
> > > have a proposal for a structural solution to that last 10-20%. It
> > > seems logical to want that gap plugged, rather than go back and say
> > > "we shouldn't have gone for the first 80%, so let's go no further".
> > >
> >
> > I'm excited about solving the remaining 10-20% of use cases for
> > flexible array dispatching, but the unumpy interface suggested here
> > (numpy.overridable) feels like a redundant redo of __array_function__
> > and __array_ufunc__.
> >
> > I would much rather continue to develop specialized protocols for the
> > remaining usecases. Summarizing those I've seen in this thread, these
> > include:
> > 1. Overrides for customizing array creation and coercion.
> > 2. Overrides to implement operations for new dtypes.
> > 3. Overriding implementations of NumPy functions, e.g., FFT and
> > ufuncs with MKL.
> >
> > (1) could mostly be solved by adding np.duckarray() and another
> > function for duck array coercion. There is still the matter of
> > overriding np.zeros and the like, which perhaps justifies another new
> > protocol, but in my experience the use-cases for truly an array from
> > scratch are quite rare.
> >
>
> There is an issue open about adding more functions for that. Made me
> wonder if giving a method of choosing the duck-array whose
> `__array_function__` is used, could not solve it reasonably.
> Similar to explicitly choosing a specific template version to call in
> templated code. In other words `np.arange(100)` (but
> with a completely different syntax, probably hidden away only for
> libraries to use).
>
>
> Maybe it is indeed time to write up a list of options to plug that
> hole, and then see where it brings us.
>
> Best,
>
> Sebastian
>
>
> > (2) should be tackled as part of overhauling NumPy's dtype system to
> > better support user defined dtypes. But it should definitely be in
> > the form of specialized protocols, e.g., which pass in preallocated
> > arrays to into ufuncs for a new dtype. By design, new dtypes should
> > not be able to customize the semantics of array *structure*.
> >
> > (3) could potentially motivate a new solution, but it should exist
> > *inside* of select existing NumPy implementations, after checking for
> > overrides with __array_function__. If the only option NumPy provides
> > for overriding np.fft is to implement np.overrideable.fft, I doubt
> > that would suffice to convince MKL developers from monkey patching it
> > -- they already decided that a separate namespace is not good enough
> > for them.
> >
> > I also share Nathaniel's concern that the overrides in unumpy are too
> > powerful, by allowing for control from arbitrary function arguments
> > and even *non-local* control (i.e., global variables) from context
> > managers. This level of flexibility can make code very hard to debug,
> > especially in larger codebases.
> >
> > Best,
> > Stephan
> >
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-25 Thread Eric Wieser
One other approach here that perhaps treads a little too close to np.matrix:

class MatrixOpWrapper:
def __init__(self, arr):  # todo: accept axis arguments here?
self._array = arr  # todo: assert that arr.ndim >= 2 / call atleast1d
@property
def T(self):
return linalg.transpose(self._array)
@property
def H(self):
return M(self._array.conj()).T
# add .I too?

M = MatrixOpWrapper

So M(arr).T instead of arr.mT, which has the benefit of not expanding the
number of ndarray members (and those needed by duck-types) further.


On Tue, 25 Jun 2019 at 14:50, Sebastian Berg 
wrote:

> On Tue, 2019-06-25 at 17:00 -0400, Marten van Kerkwijk wrote:
> > Hi Kirill, others,
> >
> > Indeed, it is becoming long! That said, while initially I was quite
> > charmed by Eric's suggestion of deprecating and then changing `.T`, I
> > think the well-argued opposition to it has changed my opinion.
> > Perhaps most persuasive to me was Matthew's point just now that code
> > (or a code snippet) that worked on an old numpy should not silently
> > do something different on a new numpy (unless the old behaviour was a
> > true bug, of course; but here `.T` has always had a very well-defined
> > meaning - even though you are right that the documentation does not
> > exactly lead the novice user away from using it for matrix transpose!
> > If someone has the time to open a PR that clarifies it.).
> >
> > Note that I do agree with the sentiment that the deprecation/change
> > would likely expose some hidden bugs - and, as noted, it is hard to
> > know where those bugs are if they are hidden! (FWIW, I did find some
> > in astropy's coordinate implementation, which was initially written
> > for scalar coordinates where `.T` worked was just fine; as a result,
> > astropy gained a `matrix_transpose` utility function.) Still, it does
> > not quite outweigh to me the disadvantages enumerated.
> >
>
> True, eventually switching is much more problematic than only
> deprecation, and yes, I guess the last step is likely forbidding.
>
> I do not care too much, but the at least the deprecation/warning does
> not seem too bad to me unless it is really widely used for high
> dimensions. Sure, it requires to touch code and may make it uglier, but
> a change requiring to touch a fair amount of scripts is not all that
> uncommon, especially if it can find some bugs (e.g. for me
> scipy.misc.factorial moving for example meant I had to change a lot of
> scripts, annoying but I could live with it).
>
> Although, I might prefer to spend our "force users to do annoying code
> changes" chips on better things. And I guess there may not be much of a
> point in a mere deprecation.
>
>
> > One thing seems clear: if `.T` is out, that means `.H` is out as well
> > (at least as a matrix transpose, the only sensible meaning I think it
> > has). Having `.H` as a conjugate matrix transpose would just cause
> > more confusion about the meaning of `.T`.
> >
>
> I tend to agree, the only way that could work seems if T was deprecated
> for high dimensions.
>
>
> > For the names, my suggestion of lower-casing the M in the initial
> > one, i.e., `.mT` and `.mH`, so far seemed most supported (and I think
> > we should discuss *assuming* those would eventually involve not
> > copying data; let's not worry about implementation details).
>
> It would be a nice assumption, but as I said, I do see an issue with
> object array support. Which makes it likely that `.H` could only be
> supported on some dtypes (similar to `.real/.imag`).
> (Strictly speaking it would be possible to make a ConugateObject dtype
> and define casting for it, I have some doubt that the added complexity
> is worth it though). The no-copy conjugate is a cool idea but
> ultimately may be a bit too cool?
>
> > So, specific items to confirm:
> >
> > 1) Is this a worthy addition? (certainly, their existence would
> > reduce confusion about `.T`... so far, my sense is tentative yes)
> >
> > 2) Are `.mT` and `.mH` indeed the consensus? [1]
> >
>
> It is likely the only reasonable option, unless you make `H` object
> which does `arr_like**H` but I doubt that is a good idea.
>
> > 3) What, if anything, should these new properties do for 0-d and 1-d
> > arrays: pass through, change shape, or error? (logically, I think
> > *new* properties should never emit warnings: either do something or
> > error).
> 
> > Marten
> >
> > [1] Some sadness about mᵀ and mᴴ - but, then, there is
> > http://www.modernemacs.com/post/prettify-mode/
> >
>
> Hehe, you are using a block for Phonetic Extensions, and that block has
> a second H which looks the same on my font but is Cyrillic. Lucky us,
> we could make one of them for row vectors and the other for column
> vectors ;).
>
> - Sebastian
>
>
> > On Tue, Jun 25, 2019 at 4:17 PM Kirill Balunov <
> > kirillbalu...@gmail.com> wrote:
> > > вт, 25 июн. 2019 г. в 21:20, Cameron Blocker <
> > > cameronjbloc...@gmail.com>:
> > > > It seems 

Re: [Numpy-discussion] new MaskedArray class

2019-06-23 Thread Eric Wieser
I think we’d need to consider separately the operation on the mask and on
the data. In my proposal, the data would always do np.sum(array,
where=~mask), while how the mask would propagate might depend on the mask
itself,

I quite like this idea, and I think Stephan’s strawman design is actually
plausible, where MaskedArray.mask is either an InvalidMask or a IgnoreMask
instance to pick between the different propagation types. Both classes
could simply have an underlying ._array attribute pointing to a duck-array
of some kind that backs their boolean data.

The second version requires that you *also* know how Mask classes work, and
how they implement +

I remain unconvinced that Mask classes should behave differently on
different ufuncs. I don’t think np.minimum(ignore_na, b) is any different
to np.add(ignore_na, b) - either both should produce b, or both should
produce ignore_na. I would lean towards produxing ignore_na, and
propagation behavior differing between “ignore” and “invalid” only for
reduce / accumulate operations, where the concept of skipping an
application is well-defined.

Some possible follow-up questions that having two distinct masked types
raise:

   - what if I want my data to support both invalid and skip fields at the
   same time? sum([invalid, skip, 1]) == invalid
   - is there a use case for more that these two types of mask?
   invalid_due_to_reason_A, invalid_due_to_reason_B would be interesting
   things to track through a calculation, possibly a dictionary of named masks.

Eric

On Sun, 23 Jun 2019 at 15:28, Stephan Hoyer  wrote:

> On Sun, Jun 23, 2019 at 11:55 PM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> Your proposal would be something like np.sum(array,
>>> where=np.ones_like(array))? This seems rather verbose for a common
>>> operation. Perhaps np.sum(array, where=True) would work, making use of
>>> broadcasting? (I haven't actually checked whether this is well-defined yet.)
>>>
>>> I think we'd need to consider separately the operation on the mask and
>> on the data. In my proposal, the data would always do `np.sum(array,
>> where=~mask)`, while how the mask would propagate might depend on the mask
>> itself, i.e., we'd have different mask types for `skipna=True` (default)
>> and `False` ("contagious") reductions, which differed in doing
>> `logical_and.reduce` or `logical_or.reduce` on the mask.
>>
>
> OK, I think I finally understand what you're getting at. So suppose this
> this how we implement it internally. Would we really insist on a user
> creating a new MaskedArray with a new mask object, e.g., with a GreedyMask?
> We could add sugar for this, but certainly array.greedy_masked().sum() is
> significantly less clear than array.sum(skipna=False).
>
> I'm also a little concerned about a proliferation of MaskedArray/Mask
> types. New types are significantly harder to understand than new functions
> (or new arguments on existing functions). I don't know if we have enough
> distinct use cases for this many types.
>
> Are there use-cases for propagating masks separately from data? If not, it
>>> might make sense to only define mask operations along with data, which
>>> could be much simpler.
>>>
>>
>> I had only thought about separating out the concern of mask propagation
>> from the "MaskedArray" class to the mask proper, but it might indeed make
>> things easier if the mask also did any required preparation for passing
>> things on to the data (such as adjusting the "where" argument in a
>> reduction). I also like that this way the mask can determine even before
>> the data what functionality is available (i.e., it could be the place from
>> which to return `NotImplemented` for a ufunc.at call with a masked index
>> argument).
>>
>
> You're going to have to come up with something more compelling than
> "separation of concerns" to convince me that this extra Mask abstraction is
> worthwhile. On its own, I think a separate Mask class would only obfuscate
> MaskedArray functions.
>
> For example, compare these two implementations of add:
>
> def  add1(x, y):
> return MaskedArray(x.data + y.data,  x.mask | y.mask)
>
> def  add2(x, y):
> return MaskedArray(x.data + y.data,  x.mask + y.mask)
>
> The second version requires that you *also* know how Mask classes work,
> and how they implement +. So now you need to look in at least twice as many
> places to understand add() for MaskedArray objects.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-23 Thread Eric Wieser
If I remember correctly, [numpy.future imports are] actually possible
but hacky. So it would probably be nicer to not go there.

There was some discussion of this at
https://stackoverflow.com/q/29905278/102441.
I agree with the conclusion we should not go there - in particular,
note that every builtin __future__ feature has been an
interpreter-level change, not an object-level change.
from __future__ import division changes the meaning of / not of int.__div__.
Framing the numpy change this way would mean rewriting Attribute(obj,
attr, Load) ast nodes to Call(np._attr_override, obj, attr), which is
obvious not interoperable with any other module wanting to do the same
thing.

This opens other unpleasant cans of worms about “builtin” modules that
perform attribute access:

Should getattr(arr, 'T') change behavior based on the module that calls it?
Should operator.itemgetter('T') change behavior ?

So I do not think we want to go down that road.

On Sun, 23 Jun 2019 at 14:28, Sebastian Berg  wrote:
>
> On Sun, 2019-06-23 at 17:12 -0400, Marten van Kerkwijk wrote:
> > Hi All,
> >
> > I'd love to have `.T` mean the right thing, and am happy that people
> > are suggesting it after I told Steward this was likely off-limits
> > (which, in fairness, did seem to be the conclusion when we visited
> > this before...). But is there something we can do to make it possible
> > to use it already but ensure that code on previous numpy versions
> > breaks? (Or works, but that seems impossible...)
> >
> > For instance, in python2, one had `from __future__ import division
> > (etc.); could we have, e.g., a `from numpy.__future__ import
> > matrix_transpose`, which, when imported, implied that `.T` just did
> > the right thing without any warning? (Obviously, since that
> > __future__.matrix_transpose wouldn't exist on older versions of
> > numpy, it would correctly break the code when used with those.)
> >
>
> If I remember correctly, this is actually possible but hacky. So it
> would probably be nicer to not go there. But yes, you are right, that
> would mean that we practically limit `.T` to 2-D arrays for at least 2
> years.
>
> > Also, a bit more towards the original request in the PR of a
> > hermitian transpose, if we're trying to go for `.T` eventually having
> > the obvious meaning, should we directly move towards also having `.H`
> > as a short-cut for `.T.conj()`? We could even expose that only with
> > the above future import - otherwise, the risk of abuse of `.T` would
> > only grow...
>
> This opens the general question of how many and which attributes we
> actually want on ndarray. My first gut reaction is that I am -0 on it,
> but OTOH, for some math it is very nice and not a huge amount of
> clutter...
>
>
> >
> > Finally, on the meaning of `.T` for 1-D arrays, the sensible choices
> > would seem to (1) error; or (2) change shape to `(n, 1)`. Since while
> > writing this sentence I changed my preference twice, I guess I should
> > go for erroring (I think we need a separate solution for easily
> > making stacks of row/column vectors).
>
> Probably an error is good, which is nice, because we can just tag on a
> warning and not worry about it for a while ;).
>
> >
> > All the best,
> >
> > Marten
> >
> > On Sun, Jun 23, 2019 at 4:37 PM Sebastian Berg <
> > sebast...@sipsolutions.net> wrote:
> > > On Sun, 2019-06-23 at 19:51 +, Hameer Abbasi wrote:
> > > > +1 for this. I have often seen (and sometimes written) code that
> > > does
> > > > this automatically, and it is a common mistake.
> > >
> > > Yeah, likely worth a short. I doubt many uses for the n-dimensional
> > > axis transpose, so maybe a futurewarning approach can work. If not,
> > > I
> > > suppose the solution is the deprecation for ndim != 2.
> > >
> > > Another point about the `.T` is the 1-dimensional case, which
> > > commonly
> > > causes confusion. If we do something here, should think about that
> > > as
> > > well.
> > >
> > > - Sebastian
> > >
> > >
> > > >
> > > > However, we will need some way to filter for intent, as the
> > > people
> > > > who write this code are the ones who didn’t read docs on it at
> > > the
> > > > time, and so there might be a fair amount of noise even if it
> > > fixes
> > > > their code.
> > > >
> > > > I also agree that a transpose of an array with ndim > 2 doesn’t
> > > make
> > > > sense wit

Re: [Numpy-discussion] Syntax Improvement for Array Transpose

2019-06-23 Thread Eric Wieser
This might be contentious, but I wonder if, with a long enough deprecation
cycle, we can change the meaning of .T. That would look like:

* Emit a future warning on `more_than_2d.T` with a message like "in future
.T will transpose just the last two dimensions, not all dimensions. Use
are.transpose() if transposing all {n} dimensions is deliberate"
* Wait 5 releases or so, see how many matches Google / GitHub has for this
warning.
* If the impact is minimal, change .T
* If the impact is large, change to a deprecation warning

An argument for this approach: a good amount of code I've seen in the wild
already assumes T is a 2d transpose, and as a result does not work
correctly when called with stacks of arrays. Changing T might fix this
broken code automatically.

If the change would be too intrusive, then keeping the deprecation warning
at least prevents new users deliberately using .T for >2d transposes, which
is possibly valuable for readers.

Eric


On Sun, Jun 23, 2019, 12:05 Stewart Clelland 
wrote:

> Hi All,
>
> Based on discussion with Marten on github
> , I have a couple of
> suggestions on syntax improvements on array transpose operations.
>
> First, introducing a shorthand for the Hermitian Transpose operator. I
> thought "A.HT" might be a viable candidate.
>
> Second, the adding an array method that operates like a normal transpose.
> To my understanding,
> "A.tranpose()" currently inverts the usual order of all dimensions. This
> may be useful in some applications involving tensors, but is not what I
> would usually assume a transpose on a multi-dimensional array would entail.
> I suggest a syntax of "A.MT" to indicate a transpose of the last two
> dimensions by default, maybe with optional arguments (i,j) to indicate
> which two dimensions to transpose.
>
> I'm new to this mailing list format, hopefully I'm doing this right :)
>
> Thanks,
> Stew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] New release note strategy after branching 1.17.

2019-06-12 Thread Eric Wieser
It's worth linking to the issue where this discussion started, so we avoid
repeating ourselves -
 https://github.com/numpy/numpy/issues/13707.

Eric

On Wed, Jun 12, 2019, 11:51 Nathaniel Smith  wrote:

> It might be worth considering a tool like 'towncrier'. It's automation to
> support the workflow where PRs that make changes also include their release
> notes, so when the release comes you've already done all the work and just
> have to hit the button.
>
> On Wed, Jun 12, 2019, 07:59 Sebastian Berg 
> wrote:
>
>> Hi all,
>>
>> we had discussed trying a new strategy to gather release notes on the
>> last community call, but not followed up on it on the list yet.
>>
>> For the next release, we decided to try a strategy of using a wiki page
>> to gather release notes. The main purpose for this is to avoid merge
>> conflicts in the release notes file. It may also make things slightly
>> easier for new contributors even without merge conflicts.
>> Any comments/opinions about other alternatives are welcome.
>>
>> We probably still need to fix some details, but I this will probably
>> mean:
>>
>> 1. We tag issues with "Needs Release Notes"
>> 2. We ask contributors/maintainers to edit the initial PR post/comment
>> with a release note snippet. (I expect maintainers may typically put in
>> a placeholder as a start for the contributor.)
>> 3. After merging, the release notes are copied into the wiki by the
>> user or a contributor. After the copy happened, the label could/should
>> be removed?
>>
>> SciPy uses a similar strategy, so they may already have some experience
>> to do it slightly different that I am missing.
>>
>> Best Regards,
>>
>> Sebastian
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] defining a NumPy API standard?

2019-06-02 Thread Eric Wieser
Some of your categories here sound like they might be suitable for ABCs
that provide mixin methods, which is something I think Hameer suggested in
the past. Perhaps it's worth re-exploring that avenue.

Eric

On Sat, Jun 1, 2019, 18:18 Marten van Kerkwijk 
wrote:

>
> Our API is huge. A simple count:
>> main namespace: 600
>> fft: 30
>> linalg: 30
>> random: 60
>> ndarray: 70
>> lib: 20
>> lib.npyio: 35
>> etc. (many more ill-thought out but not clearly private submodules)
>>
>>
> I would perhaps start with ndarray itself. Quite a lot seems superfluous
>
> Shapes:
> - need: shape, strides, reshape, transpose;
> - probably: ndim, size, T
> - less so: nbytes, ravel, flatten, squeeze, and swapaxes.
>
> Getting/setting:
> - need __getitem__, __setitem__;
> - less so: fill, put, take, item, itemset, repeat, compress, diagonal;
>
> Datatype/Copies/views/conversion
> - need: dtype, copy, view, astype, flags
> - less so: ctypes, dump, dumps, getfield, setfield, itemsize, byteswap,
> newbyteorder, resize, setflags, tobytes, tofile, tolist, tostring,
>
> Iteration
> - need __iter__
> - less so: flat
>
> Numerics
> - need: conj, real, imag
> - maybe also: min, max, mean, sum, std, var, prod, partition, sort, tracet;
> - less so: the arg* ones, cumsum, cumprod, clip, round, dot, all, any,
> nonzero, ptp, searchsorted,
> choose.
>
> All the best,
>
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Style guide for numpy code?

2019-05-09 Thread Eric Wieser
Joe,

While most of your style suggestions are reasonable, I would actually
recommend the opposite of the first point you make in (a)., especially
if you're trying to write generic reusable code.

> For example, an item count is always an integer, but a distance is always a 
> float.

This is close, but `int` and `float` are implementation details. I
think a more precise way to state this is _"an item count is a
`numbers.Integral`, a distance is a `numbers.Real`.

Where this distinction matters is if you start using `decimal.Decimal`
or `fractions.Fraction` for your distances. Those are subclasses of
`numbers.Real`, but if you mix them with floats, you either lose
precision or crash due to refusing to:
```python
In [11]: Fraction(1, 3) + 1.0
Out[11]: 1.

In [12]: Fraction(1, 3) + 1
Out[12]: Fraction(4, 3)

In [15]: Decimal('0.1') + 0
Out[15]: Decimal('0.1')

In [16]: Decimal('0.1') + 0.
TypeError: unsupported operand type(s) for +: 'decimal.Decimal' and 'float'
```

For an example of this coming up in real-world functions, look at
https://github.com/numpy/numpy/pull/13390

Eric

On Thu, 9 May 2019 at 11:19, Joe Harrington  wrote:
>
> I have a handout for my PHZ 3150 Introduction to Numerical Computing course 
> that includes some rules:
>
> (a) All integer-valued floating-point numbers should have decimal points 
> after them. For
> example, if you have a time of 10 sec, do not use
>
> y = np.e**10 # sec
>
> use
>
> y = np.e**10. # sec
>
> instead.  For example, an item count is always an integer, but a distance is 
> always a float.  A decimal in the range (-1,1) must always have a zero before 
> the decimal point, for readability:
>
> x = 0.23 # Right!
>
> x = .23 # WRONG
>
> The purpose of this one is simply to build the decimal-point habit.  In 
> Python it's less of an issue now, but sometimes code is translated, and 
> integer division is still out there.  For that reason, in other languages, it 
> may be desirable to use a decimal point even for counts, unless integer 
> division is wanted.  Make a comment whenever you intend integer division and 
> the language uses the same symbol (/) for both kinds of division.
>
> (b) Use spaces around binary operations and relations (=<>+-*/). Put a space 
> after “,”.
> Do not put space around “=” in keyword arguments, or around “ ** ”.
>
> (c) Do not put plt.show() in your homework file! You may put it in a comment 
> if you
> like, but it is not necessary. Just save the plot. If you say
>
> plt.ion()
>
> plots will automatically show while you are working.
>
> (d) Use:
>
> import matplotlib.pyplot as plt
>
> NOT:
>
> import matplotlib.pylab as plt
>
> (e) Keep lines to 80 characters, max, except in rare cases that are well 
> justified, such as
> very long strings. If you make comments on the same line as code, keep them 
> short or
> break them over more than a line:
>
> code = code2   # set code equal to code2
>
> # Longer comment requiring much more space because
> # I'm explaining something complicated.
> code = code2
>
> code = code2   # Another way to do a very long comment,
># like this one, which runs over more than
># one line.
>
> (f) Keep blocks of similar lines internally lined up on decimals, comments, 
> and = signs.  This makes them easier to read and verify.  There will be some 
> cases when this is impractical.  Use your judgment (you're not a computer, 
> you control the computer!):
>
> x=   1.  # this is a comment
> y= 378.2345  # here's another
> fred = chuck # note how the decimals, = signs, and
>  # comments line up nicely...
> alacazamshmazooboloid = 2721 # but not always!
>
> (g) Put the units and sources of all values in comments:
>
> t_planet = 523. # K, Smith and Jones (2016, ApJ 234, 22)
>
> (h) I don't mean to start a religious war, but I emphasize the alignment of 
> similar adjacent code lines to make differences pop out and reduce the 
> likelihood of bugs.  For example, it is much easier to verify the correctness 
> of:
>
> a = 3 * x + 3 * 8. * short- 5. * np.exp(np.pi * omega * t)
> a_alt = 3 * x + 3 * 8. * anotshortvar - 5. * np.exp(np.pi * omega * t)
>
> than:
>
> a = 3 * x + 3 * 8. * short - 5. * np.exp(np.pi * omega * t)
> a_altvarname = 3 * x + 3*9*anotshortvar - 5. * np.exp(np.pi * omega * i)
>
> (i) Assign values to meaningful variables, and use them in formulae and 
> functions:
>
> ny = 512
> nx = 512
> image = np.zeros((ny, nx))
> expr1 = ny * 3
> expr2 = nx * 4
>
> Otherwise, later on when you upgrade to 2560x1440 arrays, you won't know 
> which of the 512s are in the x direction and which are in the y direction.  
> Or, the student you (now a senior researcher) assign to code the upgrade 
> won't!  Also, it reduces bugs arising from the order of arguments to 
> functions if the args have meaningful names.  This is not to say that you 
> should assign all numbers to functions.  This is fine:
>
> circ = 2 * np.pi * r
>
>

Re: [Numpy-discussion] Boolean arrays with nulls?

2019-04-18 Thread Eric Wieser
One option here would be to use masked arrays:

arr = np.ma.zeros(3, dtype=bool)
arr[0] = True
arr[1] = False
arr[2] = np.ma.masked

giving

masked_array(data=[True, False, --],
 mask=[False, False,  True],
   fill_value=True)

On Thu, 18 Apr 2019 at 10:51, Stuart Reynolds  wrote:
>
> Thanks. I’m aware of bool arrays.
> I think the tricky part of what I’m looking for is NULLability and 
> interoperability with code the deals with billable data (float arrays).
>
> Currently the options seem to be float arrays, or custom operations that 
> carry (unabstracted) categorical array data representations, such as:
> 0: false
> 1: true
> 2: NULL
>
> ... which wouldn’t be compatible with algorithms that use, say, np.isnan.
> Ideally, it would be nice to have a structure that was float-like in that 
> it’s compatible with nan-aware operations, but it’s storage is just a single 
> byte per cell (or less).
>
> Is float8 a thing?
>
>
> On Thu, Apr 18, 2019 at 9:46 AM Stefan van der Walt  
> wrote:
>>
>> Hi Stuart,
>>
>> On Thu, 18 Apr 2019 09:12:31 -0700, Stuart Reynolds wrote:
>> > Is there an efficient way to represent bool arrays with null entries?
>>
>> You can use the bool dtype:
>>
>> In [5]: x = np.array([True, False, True])
>>
>> In [6]: x
>> Out[6]: array([ True, False,  True])
>>
>> In [7]: x.dtype
>> Out[7]: dtype('bool')
>>
>> You should note that this stores one True/False value per byte, so it is
>> not optimal in terms of memory use.  There is no easy way to do
>> bit-arrays with NumPy, because we use strides to determine how to move
>> from one memory location to the next.
>>
>> See also: 
>> https://www.reddit.com/r/Python/comments/5oatp5/one_bit_data_type_in_numpy/
>>
>> > What I’m hoping for is that there’s a structure that is ‘viewed’ as
>> > nan-able float data, but backed but a more efficient structures
>> > internally.
>>
>> There are good implementations of this idea, such as:
>>
>> https://github.com/ilanschnell/bitarray
>>
>> Those structures cannot typically utilize the NumPy machinery, though.
>> With the new array function interface, you should at least be able to
>> build something that has something close to the NumPy API.
>>
>> Best regards,
>> Stéfan
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Porting code for Numpy 1.13+ to get rid of "boolean index did not match indexed array along dimension 1" error

2019-02-12 Thread Eric Wieser
It looks like your code is wrong, and numpy 1.12 happened to let you get
away with it

This line:

evals = evals[evals > tolerance]

Reduces the eigenvalues to only those which are greater than the tolerance

When you do U[:, evals > tolerance], evals > tolerance is just going
to be [True,
True, ...].

You need to swap the last two lines, to

U = U[:, evals > tolerance]
evals = evals[evals > tolerance]

Or better yet, introduce an intermediate variable:

keep = evals > tolerance
evals = evals[keep]
U = U[:, keep]

Eric
​

On Tue, 12 Feb 2019 at 15:16 Mauro Cavalcanti  wrote:

> Dear ALL,
>
> I am trying to port an eigenalysis function that runs smoothly on Numpy
> 1.12 but fail miserably on Numpy 1.13 or higher with the dreadful error
> "boolean index did not match indexed array along dimension 1".
>
> Here is a fragment of the code, where the error occurrs:
>
> evals, evecs = np.linalg.eig(Syy)
> idx = evals.argsort()[::-1]
> evals = np.real(evals[idx])
> U = np.real(evecs[:, idx])
> evals = evals[evals > tolerance]
> U = U[:, evals > tolerance] # Here is where the error occurs
>
> So, I ask: is there a way out of this?
>
> Thanks in advance for any assistance you can provide.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] py3k.os_fspath enforces inconsist rules with python3 os.fspath

2019-01-14 Thread Eric Wieser
This looks like a bug to me - can you file it on the issue tracker.

Evidently I did not consider python 2 behavior when backporting `os.fspath`
from python 3.

Eric

On Mon, 14 Jan 2019 at 16:28 Stuart Reynolds 
wrote:

> After a recent upgrade of numpy (1.13.1 -> 1.6.0), my code is failing
> where I provide unicode objects as filenames.
> Previously they were allowed. Now that are not, and I *must* provide a
> (py2) str or bytearray only.
>
> # str is OK
> $ python2.7 -c "from numpy.compat import py3k; print py3k.os_fspath('123')"
> 123
>
> # unicode is not
> $ python -c "from numpy.compat import py3k; print py3k.os_fspath(u'123')"
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/local/lib/python2.7/dist-packages/numpy/compat/py3k.py", line
> 237, in os_fspath
> "not " + path_type.__name__)
> TypeError: expected str, bytes or os.PathLike object, not unicode
>
> But this enforcement of "str, bytes or os.PathLike" comes from:
>https://docs.python.org/3/library/os.html
> where in Python 3 str is a unicode, and moreover, os.fspath allows
>
> $ python3 -c "import os; print(os.fspath(u'123'))"   # unicode str
> 123
> $ python3 -c "import os; print(os.fspath('123'))"   # also unicode str
> 123
>
>  so...  shouldn't py3k.os_fspath allow py2 unicode obejcts.
>
> - Stu
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Add guaranteed no-copy to array creation and reshape?

2019-01-12 Thread Eric Wieser
I don’t think a NEVERCOPY entry in arr.flags would make much sense.
Is this really a sensible limitation to put on how data gets used? Isn’t it
up to the algorithm to decide whether to copy its data, not the original
owner of the data?

It also leads to some tricky questions of precedence - would np.array(arr,
copy=True) respect the flag or the argument? How about np.array(arr)? Is arr
+ 0 considered a copy?
By keeping it as a value passed in via a copy= kwarg, we don’t need to
answer any of those questions.

Eric

On Thu, 10 Jan 2019 at 20:28 Ralf Gommers ralf.gomm...@gmail.com
 wrote:

On Thu, Jan 10, 2019 at 11:21 AM Feng Yu  wrote:
>
>> Hi Todd,
>>
>> I agree a flag is more suitable than classes.
>>
>> I would add another bonus of a flag than a function argument is to avoid
>> massive contamination of function signatures for a global variation of
>> behavior that affects many functions.
>>
>
> I like this suggestion. Copy behavior fits very nicely with existing flags
> (e.g. UPDATEIFCOPY, WRITEABLE) and avoids both namespace pollution and
> complication docstrings.
>
> Ralf
>
>
>> Yu
>>
>> On Wed, Jan 9, 2019 at 11:34 PM Todd  wrote:
>>
>>> On Mon, Jan 7, 2019, 14:22 Feng Yu >>
 Hi,

 Was it ever brought up the possibility of a new array class (ndrefonly,
 ndview) that is strictly no copy?

 All operations on ndrefonly will return ndrefonly and if the operation
 cannot be completed without making a copy, it shall throw an error.

 On the implementation there are two choices if we use subclasses:

 - ndrefonly can be a subclass of ndarray. The pattern would be subclass
 limiting functionality of super, but ndrefonly is a ndarray.
 - ndarray as a subclass of ndarray. Subclass supplements functionality
 of super. : ndarray will not throw an error when a copy is necessary.
 However ndarray is not a ndarray.

 If we want to be wild they do not even need to be subclasses of each
 other, or maybe they shall both be subclasses of something more
 fundamental.

 - Yu

>>>
>>> I would prefer a flag for this.  Someone can make an array read-only by
>>> setting `arr.flags.writable=False`.  So along those lines, we could have a
>>> `arr.flags.copyable` flag that if set to `False` would result in an error
>>> of any operation tried to copy the data.
>>>
 ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
​
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Add guaranteed no-copy to array creation and reshape?

2019-01-09 Thread Eric Wieser
Slicing is a lot more important than some keyword. And design-wise, filling
the numpy namespace with singletons for keyword to other things in that
same namespace just makes no sense to me.

At least from the perspective of discoverability, you could argue that
string constants form a namespace of their won, and so growing the “string”
namespace is not inherently better than growing any other. The main flaw in
that comparison is that picking np.never_copy to be a singleton forever
prevents us reusing that name to become a function.

Perhaps the solution is to use np.NEVER_COPY instead - that’s never going
to clash with a function name we want to add in future, and using upper
attributes as arguments in that way is pretty typical for python (
subprocess.PIPE, socket.SOCK_STREAM, etc…)

You could fairly argue that this approach is outdated in the face of
enum.Enum - in which case we could go for the more heavy-handed
np.CopyMode.NEVER, which still has a unique enough case for name clashes
with functions never to be an issue.

Eric
​

On Wed, 9 Jan 2019 at 22:25 Ralf Gommers  wrote:

> On Mon, Jan 7, 2019 at 11:30 AM Eric Wieser 
> wrote:
>
>>
>> @Ralf
>>
>> np.newaxis is not relevant here - it’s a simple alias for None, is just
>> there for code readability, and is much more widely applicable than
>> np.never_copy would be.
>>
>> Is there any particular reason we chose to use None? If I were designing
>> it again, I’d consider a singleton object with a better __repr__
>>
> It stems from Numeric:
> https://mail.python.org/pipermail/python-list/2009-September/552203.html.
> Note that the Python builtin slice also uses None, but that's probably due
> to Numeric using it first.
>
> Agree that a singleton with a nice repr could be a better choice than
> None. The more important part of my comment was "widely applicable" though.
> Slicing is a lot more important than some keyword. And design-wise, filling
> the numpy namespace with singletons for keyword to other things in that
> same namespace just makes no sense to me.
>
> Cheers,
> Ralf
>
>
>
>> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] timedelta64 remainder behavior with div by 0

2019-01-08 Thread Eric Wieser
If we consider it a bug, we could patch it in 1.16.1 (or are we still
waiting on 1.16.0?), which would minimize the backwards compatibility cost.

Eric

On Tue, 8 Jan 2019 at 10:05 Stefan van der Walt 
wrote:

> On Tue, 08 Jan 2019 09:57:03 -0800, Tyler Reddy wrote:
> > np.timedelta64(5) % np.timedelta64(0) -> numpy.timedelta64(0)
> >
> > In contrast, np.float64(1) % np.float64(0) -> nan
> >
> > There's a suggestion that we should switch to returning NaT for the
> > timedelta64 case for consistency, and that this probably isn't too
> harmful
> > given how recent these additions are.
>
> That seems like a reasonable change to me; one could probably consider the
> previous behavior a bug?
>
> Stéfan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Add guaranteed no-copy to array creation and reshape?

2019-01-07 Thread Eric Wieser
@Matthias:

Most of the time I would not assign to arr.shape, but in some rare
occasions I find it very useful.

And one of those rare occasions is when you want guaranteed no-copy
behavior.

Can you come up with any other example?
The only real argument you seem to have here is “my code uses arr.shape =
...“ and I don’t want it to break. That’s a fair argument, but all it
really means is we should start emitting DeprecationWarning("Use arr =
arr.reshape(..., copy=np.never_copy) instead of arr.shape = ..."), and
consider having a long deprecation.
If necessary we could compromise on just putting a warning in the docs, and
not notifying the user at all.

@Ralf

np.newaxis is not relevant here - it’s a simple alias for None, is just
there for code readability, and is much more widely applicable than
np.never_copy would be.

Is there any particular reason we chose to use None? If I were designing it
again, I’d consider a singleton object with a better __repr__

@Nathaniel

I guess another possibility to throw out there would be a second kwarg,
require_view=False/True.

The downside of this approach is that array-likes will definitely need
updating to support this new behavior, whereas many may work out of the box
if we extend the copy argument (like, say, maskedarray). This also ties
into the __bool__ override - that will ensure that subclasses which don’t
have a trivial reshape crash.

@Sebastian:

Unless we replace the string when dispatching, which seems strange on first
sight.

I’m envisaging cases where we don’t have a dispatcher at all:

   - Duck arrays implementing methods matching ndarray
   - Something like my_custom_function(arr, copy=...) that forwards its
   copy argument to reshape

Eric
​

On Mon, 7 Jan 2019 at 11:05 Matthias Geier  wrote:

> On Wed, Jan 2, 2019 at 2:24 PM Sebastian Berg wrote:
> >
> > On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
> > > Hi Sebastian.
> > >
> > > Thanks for the clarification.
> > >
> > 
> > > > print(arr.shape)  # also (5, 2)
> > > >
> > > > so the arr container (shape, dtype) is changed/muted. I think we
> > > > expect
> > > > that for content here, but not for the shape.
> > >
> > > Thanks for the clarification, I think I now understand your example.
> > >
> > > However, the behavior you are describing is just like the normal
> > > reference semantics of Python itself.
> > >
> > > If you have multiple identifiers bound to the same (mutable) object,
> > > you'll always have this "problem".
> > >
> > > I think every Python user should be aware of this behavior, but I
> > > don't think it is reason to discourage assigning to arr.shape.
> >
> > Well, I doubt I will convince you.
>
> I think we actually have quite little disagreement.
>
> I agree with you on what should be done *most of the time*, but I
> wouldn't totally discourage mutating NumPy array shapes, because I
> think in the right circumstances it can be very useful.
>
> > But want to point out that a numpy
> > array is:
> >
> >   * underlying data
> >   * shape/strides (pointing to the exact data)
> >   * data type (interpret the data)
> >
> > Arrays are mutable, but this is only half true from my perspective.
> > Everyone using numpy should be aware of "views", i.e. that the content
> > of the underlying data can change.
>
> I agree, everyone should be aware of that.
>
> > However, if I have a read-only array, and pass it around, I would not
> > expect it to change. That is because while the underlying data is
> > muted, how this data is accessed and interpreted is not.
> >
> > In other words, I see array objects as having two sides to them [0]:
> >
> >   * Underlying data   -> normally mutable and often muted
> >   * container:-> not muted by almost all code
> >   * shape/strides
> >   * data type
>
> Exactly: "almost all code".
>
> Most of the time I would not assign to arr.shape, but in some rare
> occasions I find it very useful.
>
> And one of those rare occasions is when you want guaranteed no-copy
> behavior.
>
> There are also some (most likely significantly rarer) cases where I
> would modify arr.strides.
>
> > I realize that in some cases muting the container metadata happens. But
> > I do believe it should be as minimal as possible. And frankly, probably
> > one could do away with it completely.
>
> I guess that's the only point where we disagree.
>
> I wouldn't completely discourage it and I would definitely not remove
> the functionality.
>
> > Another example for where it is bad would be a threaded environment. If
> > a python function temporarily changes the shape of an array to read
> > from it without creating a view first, this will break multi-threaded
> > access to that array.
>
> Sure, let's not use it while multi-threading then.
>
> I still think that's not at all a reason to remove the feature.
>
> There are some things that are problematic when multi-threading, but
> that's typically not reason enough to completely disallow them.
>
> cheers,
> Matthias

[Numpy-discussion] Adding more detailed exception types to numpy

2019-01-04 Thread Eric Wieser
PR #12593  adds a handful of new
exception types for . Some consequences of this change are that:

   1. The string formatting is moved to python, allowing us to give better
   error messages without a lot of work
   2. The formatting is dispatched lazily, meaning that users trying
   ufuncs, but catching unsupported loop types don’t have to pay the cost of
   string formatting
   3. Users can catch a specific exception type. This might expose more of
   our internals than we’re willing to though.
   4. We need to expose these new exception names in the public API.

3 & 4 raise some questions, which I’d like some feedback on:

   -

   Should we actually expose the detailed exception types to the user, or
   should they be kept an implementation detail? One way to hide the
   implementation details would be

   class TypeError(builtins.TypeError):
   """actually UFuncCastingError"""
   _UFuncCastingError = TypeError  # for internal use when raising
   _UFuncCastingError.__module__ = None
   class TypeError(builtins.TypeError):
   """actually UFuncLoopError"""
   _UFuncLoopError= TypeError  # for internal use when raising
   _UFuncLoopError  .__module__ = None
   del TypeError

   This gives us all the advantages of 1 & 2 without the user noticing that
   they’re receiving anything other than TypeError, which their tracebacks
   will continue to show.
   -

   If we do expose them, where should these exceptions go? In the past, we
   also added AxisError and TooHardError - it would be nice to start being
   consistent about where to expose these things
   1. np.UFuncCastingError (expands the global namespace even further)
  2. np.core._exceptions.UFuncCastingError (shouldn’t really be
  private, since users won’t know how to catch it)
  3. np.core.UFuncCastingError
  4. np.core.umath.CastingError
  5. A dedicated namespace for exceptions:
 - np.errors.UFuncCastingError (matches pandas)
 - np.exceptions.UFuncCastingError
 - np.exc.UFuncCastingError (matches sqlalchemy)

Eric
​
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Add guaranteed no-copy to array creation and reshape?

2018-12-27 Thread Eric Wieser
I think np.never_copy is really ugly. I’d much rather simply use ‘never’,

I actually think np.never_copy is the better choice of the two. Arguments
for it:

   - It’s consistent with np.newaxis in spelling (modulo the _)
   - If mistyped, it can be caught easily by IDEs.
   - It’s typeable with mypy, unlike constant string literals which
   currently aren’t
   - If code written against new numpy is run on old numpy, it will error
   rather than doing the wrong thing
   - It won’t be possible to miss parts of our existing API that evaluate a
   copy argument in a boolean context
   - It won’t be possible for downstream projects to misinterpret by not
   checking for ‘never’ - an error will be raised instead.

Arguments against it:

   - It’s more characters than "never"
   - The implementation of np.never_copy is a little verbose / ugly

Eric

On Thu, Dec 27, 2018, 1:41 AM Ralf Gommers 
> On Wed, Dec 26, 2018 at 3:29 PM Sebastian Berg 
> wrote:
>
>> Hi all,
>>
>> In https://github.com/numpy/numpy/pull/11897 I am looking into the
>> addition of a `copy=np.never_copy` argument to:
>>   * np.array
>>   * arr.reshape/np.reshape
>>   * arr.astype
>>
>> Which would cause an error to be raised when numpy cannot guarantee
>> that the returned array is a view of the input array.
>> The motivation is to easier avoid accidental copies of large data, or
>> ensure that in-place manipulation will be meaningful.
>>
>> The copy flag API would be:
>>   * `copy=True` forces a copy
>>   * `copy=False` allows numpy to copy if necessary
>>   * `copy=np.never_copy` will error if a copy would be necessary
>>   * (almost) all other input will be deprecated.
>>
>> Unfortunately using `copy="never"` is tricky, because currently
>> `np.array(..., copy="never")` behaves exactly the same as
>> `np.array(..., copy=bool("never"))`. So that the wrong result would be
>> given on old numpy versions and it would be a behaviour change.
>>
>
> I think np.never_copy is really ugly. I'd much rather simply use 'never',
> and clearly document that if users start using this and they critically
> rely on it really being never, then they should ensure that their code is
> only used with numpy >= 1.17.0.
>
> Note also that this would not be a backwards compatibility break, because
> `copy` is now clearly documented as bool, and not bool_like or some such
> thing. So we do not need to worry about the very improbable case that users
> now are using `copy='never'`.
>
> If others think `copy='never'` isn't acceptable now, there are two other
> options:
> 1. add code first to catch `copy='never'` in 1.17.x and raise on it, then
> in a later numpy version introduce it.
> 2. just do nothing. I'd prefer that over `np.never_copy`.
>
> Cheers,
> Ralf
>
>
>> Some things that are a not so nice maybe:
>>  * adding/using `np.never_copy` is not very nice
>>  * Scalars need a copy and so will not be allowed
>>  * For rare array-likes numpy may not be able to guarantee no-copy,
>>although it could happen (but should not).
>>
>>
>> The history is that a long while ago I considered adding a copy flag to
>> `reshape` so that it is possible to do `copy=np.never_copy` (or
>> similar) to ensure that no copy is made. In these, you may want
>> something like an assertion:
>>
>> ```
>> new_arr = arr.reshape(new_shape)
>> assert np.may_share_memory(arr, new_arr)
>>
>> # Which is sometimes -- but should not be -- written as:
>> arr.shape = new_shape  # unnecessary container modification
>>
>> # Or:
>> view = np.array(arr, order="F")
>> assert np.may_share_memory(arr, new_arr)
>> ```
>>
>> but is more readable and will not cause an intermediate copy on error.
>>
>>
>> So what do you think? Other variants would be to not expose this for
>> `np.array` and probably limit `copy="never"` to the reshape method. Or
>> just to not do it at all. Or to also accept "never" for `reshape`,
>> although I think I would prefer to keep it in sync and wait for a few
>> years to consider that.
>>
>> Best,
>>
>> Sebastian
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
​
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] three-dim array

2018-12-26 Thread Eric Wieser
The rationale for the change allowing that construction was twofold: it's
easier to understand what has gone wrong when seeing the `list`s in the
repr than it was from the cryptic error message; and there were some jagged
cases that already succeeded in this way, and it was less confusing to be
consistent.

I agree that the behavior is not terribly useful, and object arrays
constructed containing lists are quite possibly something we should warn
about.

Eric

On Thu, Dec 27, 2018, 2:22 AM Benjamin Root  Ewww, kinda wish that would be an error... It would be too easy for a typo
> to get accepted this way.
>
> On Wed, Dec 26, 2018 at 1:59 AM Eric Wieser 
> wrote:
>
>> In the latest version of numpy, this runs without an error, although may
>> or may not be what you want:
>>
>> In [1]: np.array([[1,2],[[1,2],[3,4]]])
>> Out[1]:
>> array([[1, 2],
>>[list([1, 2]), list([3, 4])]], dtype=object)
>>
>> Here the result is a 2x2 array, where some elements are numbers and
>> others are lists.
>> ​
>>
>> On Wed, 26 Dec 2018 at 06:23 Mark Alexander Mikofski <
>> mikof...@berkeley.edu> wrote:
>>
>>> I believe numpy arrays must be rectangular, yours is jagged, instead try
>>>
>>> >>> x3d = np.array([[[1, 2], [1, 2], [3, 4]]])
>>> >>> x3d.shape
>>> (1, 3, 2)
>>>
>>> Note: 3 opening brackets, yours has 2
>>> And single brackets around the 3 innermost arrays, yours has single
>>> brackets for the 1st, and double brackets around the 2nd and 3rd
>>>
>>>
>>> On Tue, Dec 25, 2018, 6:20 PM Jeff >>
>>>> hello,
>>>>
>>>> sorry newbe to numpy.
>>>>
>>>> I want to define a three-dim array.
>>>> I know this works:
>>>>
>>>>  >>> np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
>>>> array([[[1, 2],
>>>>  [3, 4]],
>>>>
>>>> [[5, 6],
>>>>  [7, 8]]])
>>>>
>>>> But can you tell why this doesnt work?
>>>>
>>>>  >>> np.array([[1,2],[[1,2],[3,4]]])
>>>> Traceback (most recent call last):
>>>>File "", line 1, in 
>>>> ValueError: setting an array element with a sequence.
>>>>
>>>>
>>>> Thank you.
>>>> ___
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion@python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] three-dim array

2018-12-25 Thread Eric Wieser
In the latest version of numpy, this runs without an error, although may or
may not be what you want:

In [1]: np.array([[1,2],[[1,2],[3,4]]])
Out[1]:
array([[1, 2],
   [list([1, 2]), list([3, 4])]], dtype=object)

Here the result is a 2x2 array, where some elements are numbers and others
are lists.
​

On Wed, 26 Dec 2018 at 06:23 Mark Alexander Mikofski 
wrote:

> I believe numpy arrays must be rectangular, yours is jagged, instead try
>
> >>> x3d = np.array([[[1, 2], [1, 2], [3, 4]]])
> >>> x3d.shape
> (1, 3, 2)
>
> Note: 3 opening brackets, yours has 2
> And single brackets around the 3 innermost arrays, yours has single
> brackets for the 1st, and double brackets around the 2nd and 3rd
>
>
> On Tue, Dec 25, 2018, 6:20 PM Jeff 
>> hello,
>>
>> sorry newbe to numpy.
>>
>> I want to define a three-dim array.
>> I know this works:
>>
>>  >>> np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
>> array([[[1, 2],
>>  [3, 4]],
>>
>> [[5, 6],
>>  [7, 8]]])
>>
>> But can you tell why this doesnt work?
>>
>>  >>> np.array([[1,2],[[1,2],[3,4]]])
>> Traceback (most recent call last):
>>File "", line 1, in 
>> ValueError: setting an array element with a sequence.
>>
>>
>> Thank you.
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lstsq underdetermined behaviour

2018-11-18 Thread Eric Wieser
> In 1.15 the call is instead to `_umath_linalg.lstsq_m` and I'm not sure
what this actually ends up doing - does this end up being the same as
`dgelsd`?

When the arguments are real, yes. What changed is that the dispatching now
happens in C, which was done as a step towards the incomplete
https://github.com/numpy/numpy/issues/8720.

I'm not an expert - but aren't "minimum norm" and "least squares" two ways
to state the same thing?

Eric

On Sun, 18 Nov 2018 at 20:04 Romesh Abeysuriya 
wrote:

> Hi all,
>
> I'm solving an underdetermined system using `numpy.linalg.lstsq` and
> trying to track down its behavior for underdetermined systems. In
> previous versions of numpy (e.g. 1.14) in `linalg.py` the definition
> for `lstsq` calls `dgelsd` for real inputs, which I think means that
> the underdetermined system is solved with the minimum-norm solution
> (that is, minimizing the norm of the solution vector, in addition to
> minimizing the residual). In 1.15 the call is instead to
> `_umath_linalg.lstsq_m` and I'm not sure what this actually ends up
> doing - does this end up being the same as `dgelsd`? If so, it would
> be great if the documentation for  `numpy.linalg.lstsq` stated that it
> is returning the minimum-norm solution (as it stands, it reads as
> undefined, so in theory I don't think one can rely on any particular
> solution being returned for an underdetermined system)
>
> Cheers,
> Romesh
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorized version of numpy.linspace

2018-11-14 Thread Eric Wieser
I too buy into axis=0 being the better default here for broadcasting
reasons. Having it possible to specify explicitly would be handy though,
for something like:

x_ramped = np.linspace(x.min(axis=2), x.max(axis=2), 100, axis=2)

​

On Wed, 14 Nov 2018 at 15:20 Stephan Hoyer  wrote:

> On Wed, Nov 14, 2018 at 3:16 PM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> I see the logic in having the linear space be last, but one
>> non-negligible advantage of the default being the first axis is that
>> whatever is produced broadcasts properly against start and stop.
>> -- Marten
>>
>
> Yes, this is exactly why I needed to insert the new axis at the start.
>
> That said, either default axis position is fine by me as long as we have
> the explicit option.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] asarray/anyarray; matrix/subclass

2018-11-10 Thread Eric Wieser
> If the only way MaskedArray violates Liskov is in terms of NA skipping
aggregations by default, then this might be viable

One of the ways to fix these liskov substitution problems is just to
introduce more base classes - for instance, if we had an `NDContainer` base
class with only slicing support, then masked arrays would be an exact
liskov substitution, but np.matrix would not.

Eric

On Sat, 10 Nov 2018 at 12:17 Stephan Hoyer  wrote:

> On Sat, Nov 10, 2018 at 9:49 AM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> Hi Hameer,
>>
>> I do not think we should change `asanyarray` itself to special-case
>> matrix; rather, we could start converting `asarray` to `asanyarray` and
>> solve the problems that produces for matrices in `matrix` itself (e.g., by
>> overriding the relevant function with `__array_function__`).
>>
>> I think the idea of providing an `__anyarray__` method (in analogy with
>> `__array__`) might work. Indeed, the default in `ndarray` (and thus all its
>> subclasses) could be to let it return `self`  and to override it for
>> `matrix` to return an ndarray view.
>>
>
> Yes, we certainly would rather implement a matrix.__anyarray__ method (if
> we're already doing a new protocol) rather than special case np.matrix
> explicitly.
>
> Unfortunately, per Nathaniel's comments about NA skipping behavior, it
> seems like we will also need MaskedArray.__anyarray__ to return something
> other than itself. In principle, we should probably write new version of
> MaskedArray that doesn't deviate from ndarray semantics, but that's a
> rather large project (we'd also probably want to stop subclassing ndarray).
>
> Changing the default aggregation behavior for the existing MaskedArray is
> also an option but that would be a serious annoyance to users and backwards
> compatibility break. If the only way MaskedArray violates Liskov is in
> terms of NA skipping aggregations by default, then this might be viable. In
> practice, this would require adding an explicit skipna argument so
> FutureWarnings could be silenced. The plus side of this option is that it
> would make it easier to use np.anyarray() or any new coercion function
> throughout the internal NumPy code base.
>
> To summarize, I think these are our options:
> 1. Change the behavior of np.anyarray() to check for an __anyarray__()
> protocol. Change np.matrix.__anyarray__() to return a base numpy array
> (this is a minor backwards compatibility break, but probably for the best).
> Start issuing a FutureWarning for any MaskedArray operations that violate
> Liskov and add a skipna argument that in the future will default to
> skipna=False.
> 2. Introduce a new coercion function, e.g., np.duckarray(). This is the
> easiest option because we don't need to cleanup NumPy's existing ndarray
> subclasses.
>
> P.S. I'm just glad pandas stopped subclassing ndarray a while ago --
> there's no way pandas.Series() could be fixed up to not violate Liskov :).
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy pprint?

2018-11-06 Thread Eric Wieser
Foad:

having the functionality for conventional consols would also help

I think the most important thing in a conventional console is to output the
array in a format that allows you to reconstruct the object. That makes it
way easier for people to reproduce each others problems without having
their full dataset. If your goal is to visualize complex arrays, I think
the console is a pretty limited tool, and numpy already does as much as is
worthwhile there.

I don’t think putting everything in boxes is helping. it is confusing. I
would rather having horizontal and vertical square brackets represent each
nested array

See my update at the same link, which shows an alternative which draws
those brackets as you envi

it would be awesome if in IPython/Jupyter hovering over an element a popup
would show the index

It… already does?

to show L R U P or combination of these plus some numbers

I don’t know what you mean by this.

Eric

On Tue, 6 Nov 2018 at 00:56 Foad Sojoodi Farimani f.s.farim...@gmail.com
<http://mailto:f.s.farim...@gmail.com> wrote:

Wow, this is awesome.
> Some points though:
>
>- not everybody uses IPython/Jupyter having the functionality for
>conventional consols would also help. something like
>Sypy's init_printing/init_session which smartly chooses the right
>representation considering the terminal.
>- I don't think putting everything in boxes is helping. it is
>confusing. I would rather having horizontal and vertical square brackets
>represent each nested array
>- it would be awesome if in IPython/Jupyter hovering over an element a
>popup would show the index
>- one could read the width and height of the terminal and other
>options I mentioned in reply Mark to show L R U P or combination of these
>plus some numbers (similar to Pandas .head .tail) methods and then show the
>rest by unicod 3dot
>
> P.S. I had no idea our university Microsoft services also offers Azure
> Notebooks awesome :P
>
> F.
>
> On Tue, Nov 6, 2018 at 9:45 AM Eric Wieser 
> wrote:
>
>> Here's how that could look
>>
>>
>> https://numpyintegration-ericwieser.notebooks.azure.com/j/notebooks/pprint.ipynb
>>
>> Feel free to play around and see if you can produce something more useful
>>
>>
>>
>> On Mon, 5 Nov 2018 at 23:28 Foad Sojoodi Farimani 
>> wrote:
>>
>>> It is not highking if I asked for it :))
>>> for IPython/Jupyter using Markdown/LaTeX would be awesome
>>> or even better using HTML to add sliders just like Pandas...
>>>
>>> F.
>>>
>>> On Tue, Nov 6, 2018 at 6:51 AM Eric Wieser 
>>> wrote:
>>>
>>>> Hijacking this thread while on the topic of pprint - we might want to
>>>> look into a table-based `_html_repr_` or `_latex_repr_` for use in ipython
>>>> - where we can print the full array and let scrollbars replace ellipses.
>>>>
>>>> Eric
>>>>
>>>> On Mon, 5 Nov 2018 at 21:11 Mark Harfouche 
>>>> wrote:
>>>>
>>>>> Foad,
>>>>>
>>>>> Visualizing data is definitely a complex field. I definitely feel your
>>>>> pain.
>>>>> Printing your data is but one way of visualizing it, and probably only
>>>>> useful for very small and constrained datasets.
>>>>> Have you looked into set_printoptions
>>>>> <https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.set_printoptions.html>
>>>>> to see how numpy’s existing capabilities might help you with your
>>>>> visualization?
>>>>>
>>>>> The code you showed seems quite good. I wouldn’t worry about
>>>>> performance when it comes to functions that will seldom be called in tight
>>>>> loops.
>>>>> As you’ll learn more about python and numpy, you’ll keep expanding it
>>>>> to include more use cases.
>>>>> For many of my projects, I create small submodules for visualization
>>>>> tailored to the specific needs of the particular project.
>>>>> I’ll try to incorporate your functions and see how I use them.
>>>>>
>>>>> Your original post seems to have some confusion about C Style vs F
>>>>> Style ordering. I hope that has been resolved.
>>>>> There is also a lot of good documentation
>>>>>
>>>>> https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html#numpy-for-matlab-users-notes
>>>>> about transitioning from matlab.
>>>>>
>>>>> Mark
>>>&

Re: [Numpy-discussion] numpy pprint?

2018-11-06 Thread Eric Wieser
Here's how that could look

https://numpyintegration-ericwieser.notebooks.azure.com/j/notebooks/pprint.ipynb

Feel free to play around and see if you can produce something more useful



On Mon, 5 Nov 2018 at 23:28 Foad Sojoodi Farimani 
wrote:

> It is not highking if I asked for it :))
> for IPython/Jupyter using Markdown/LaTeX would be awesome
> or even better using HTML to add sliders just like Pandas...
>
> F.
>
> On Tue, Nov 6, 2018 at 6:51 AM Eric Wieser 
> wrote:
>
>> Hijacking this thread while on the topic of pprint - we might want to
>> look into a table-based `_html_repr_` or `_latex_repr_` for use in ipython
>> - where we can print the full array and let scrollbars replace ellipses.
>>
>> Eric
>>
>> On Mon, 5 Nov 2018 at 21:11 Mark Harfouche 
>> wrote:
>>
>>> Foad,
>>>
>>> Visualizing data is definitely a complex field. I definitely feel your
>>> pain.
>>> Printing your data is but one way of visualizing it, and probably only
>>> useful for very small and constrained datasets.
>>> Have you looked into set_printoptions
>>> <https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.set_printoptions.html>
>>> to see how numpy’s existing capabilities might help you with your
>>> visualization?
>>>
>>> The code you showed seems quite good. I wouldn’t worry about performance
>>> when it comes to functions that will seldom be called in tight loops.
>>> As you’ll learn more about python and numpy, you’ll keep expanding it to
>>> include more use cases.
>>> For many of my projects, I create small submodules for visualization
>>> tailored to the specific needs of the particular project.
>>> I’ll try to incorporate your functions and see how I use them.
>>>
>>> Your original post seems to have some confusion about C Style vs F Style
>>> ordering. I hope that has been resolved.
>>> There is also a lot of good documentation
>>>
>>> https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html#numpy-for-matlab-users-notes
>>> about transitioning from matlab.
>>>
>>> Mark
>>>
>>> On Mon, Nov 5, 2018 at 4:46 PM Foad Sojoodi Farimani <
>>> f.s.farim...@gmail.com> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> Following this question <https://stackoverflow.com/q/53126305/491>,
>>>> I'm convinced that numpy ndarrays are not MATLAB/mathematical
>>>> multidimentional matrices and I should stop expecting them to be. However I
>>>> still think it would have a lot of benefit to have a function like sympy's
>>>> pprint to pretty print. something like pandas .head and .tail method plus
>>>> .left .right .UpLeft .UpRight .DownLeft .DownRight methods. when nothing
>>>> mentioned it would show 4 corners and put dots in the middle if the array
>>>> is to big for the terminal.
>>>>
>>>> Best,
>>>> Foad
>>>> ___
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion@python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy pprint?

2018-11-05 Thread Eric Wieser
Hijacking this thread while on the topic of pprint - we might want to look
into a table-based `_html_repr_` or `_latex_repr_` for use in ipython -
where we can print the full array and let scrollbars replace ellipses.

Eric

On Mon, 5 Nov 2018 at 21:11 Mark Harfouche  wrote:

> Foad,
>
> Visualizing data is definitely a complex field. I definitely feel your
> pain.
> Printing your data is but one way of visualizing it, and probably only
> useful for very small and constrained datasets.
> Have you looked into set_printoptions
> 
> to see how numpy’s existing capabilities might help you with your
> visualization?
>
> The code you showed seems quite good. I wouldn’t worry about performance
> when it comes to functions that will seldom be called in tight loops.
> As you’ll learn more about python and numpy, you’ll keep expanding it to
> include more use cases.
> For many of my projects, I create small submodules for visualization
> tailored to the specific needs of the particular project.
> I’ll try to incorporate your functions and see how I use them.
>
> Your original post seems to have some confusion about C Style vs F Style
> ordering. I hope that has been resolved.
> There is also a lot of good documentation
>
> https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html#numpy-for-matlab-users-notes
> about transitioning from matlab.
>
> Mark
>
> On Mon, Nov 5, 2018 at 4:46 PM Foad Sojoodi Farimani <
> f.s.farim...@gmail.com> wrote:
>
>> Hello everyone,
>>
>> Following this question ,
>> I'm convinced that numpy ndarrays are not MATLAB/mathematical
>> multidimentional matrices and I should stop expecting them to be. However I
>> still think it would have a lot of benefit to have a function like sympy's
>> pprint to pretty print. something like pandas .head and .tail method plus
>> .left .right .UpLeft .UpRight .DownLeft .DownRight methods. when nothing
>> mentioned it would show 4 corners and put dots in the middle if the array
>> is to big for the terminal.
>>
>> Best,
>> Foad
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Attribute hiding APIs for PyArrayObject

2018-10-30 Thread Eric Wieser
In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we
would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps.
The strange warning when NPY_NO_DEPRICATED_API is annoying

I’m not sure I make the connection here between hidden fields and API
deprecation. You seem to be asking two vaguely related questions:

   1. Should we have deprecated field access in the first place
   2. Does our api deprecation mechanism need work

I think a more substantial problem statement is needed for 2, so I’m only
going to respond to 1 here.

Hiding fields seems to me to match the CPython model of things, where your
public api is PyArray_SomeGetter(thing).
If you look at the cpython source code
,
they only expose the underlying struct fields if you don’t define
Py_LIMITED_API, ie if you as a consumer volunteer to be broken by upstream
changes in minor versions. People (like us) are willing to produce separate
builds for each python versions, so often do not define this.

We could add a similar PyArray_LIMITED_API that allows field access under a
similar guarantee - the question is, are many downstream consumers willing
to produce builds against multiple numpy versions? (especially if they also
do so against multiple python versions)

Also, for example, cython has a mechanism to transpile python code into C,
mapping slow python attribute lookup to fast C struct field access

How does this work for builtin types? Does cython deliberately not define
Py_LIMITED_API? Or are you just forced to use PyTuple_GetItem(t) if you
want the fast path.

Eric

On Tue, 30 Oct 2018 at 02:04 Matti Picus  wrote:

TL;DR - should we revert the attribute-hiding constructs in
> ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject?
>
>
> Background
>
>
> NumPy 1.8 deprecated direct access to PyArrayObject fields. It made
> PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields
> structure
>
> https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659
> with a comment about moving this to a private header. In order to access
> the fields, users are supposed to use PyArray_FIELDNAME functions, like
> PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time
> that numpy might move away from a C-struct based
>
> underlying data structure. Other changes were also made to enum names,
> but those are relatively painless to find-and-replace.
>
>
> NumPy has a mechanism to manage deprecating APIs, C users define
> NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and
> can then access the API "as if" they were using NumPy 1.8. Users who do
> not define NPY_NO_DEPRICATED_API get a warning when compiling, and
> default to the pre-1.8 API (aliasing of PyArrayObject to
> PyArrayObject_fields and direct access to the C struct fields). This is
> convenient for downstream users, both since the new API does not provide
> much added value, and it is much easier to write a->nd than
> PyArray_NDIM(a). For instance, pandas uses direct assignment to the data
> field for fast json parsing
>
> https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203
> via chunks. Working around the new API in pandas would require more
> engineering. Also, for example, cython has a mechanism to transpile
> python code into C, mapping slow python attribute lookup to fast C
> struct field access
>
> https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types
>
>
> In a parallel but not really related universe, cython recently upgraded
> the object mapping so that we can quiet the annoying "size changed"
> runtime warning https://github.com/numpy/numpy/issues/11788 without
> requiring warning filters, but that requires updating the numpy.pxd file
> provided with cython, and it was proposed that NumPy actually vendor its
> own file rather than depending on the cython one
> (https://github.com/numpy/numpy/issues/11803).
>
>
> The problem
>
>
> We have now made further changes to our API. In NumPy 1.14 we changed
> UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate
> PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning
> when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported
> by cython without some deep surgery
> (https://github.com/cython/cython/pull/2640). When I tried dogfooding an
> updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came
> across some of these issues (https://github.com/numpy/numpy/pull/12284).
> Forcing the new API will require downstream users to refactor code or
> re-engineer constructs, as in the pandas example above.
>
>
> The question
>
>
> Is the attribute-hiding effort worth it? Should we give up, revert the
> PyArrayObject/PyArrayObject_fields division and allow direct access from
> C to the numpy internals? Is there anoth

Re: [Numpy-discussion] asanyarray vs. asarray

2018-10-29 Thread Eric Wieser
The latter - changing the behavior of multiplication breaks the principle.

But this is not the main reason for deprecating matrix - almost all of the
problems I’ve seen have been caused by the way that matrices behave when
sliced. The way that m[i][j] and m[i,j] are different is just one example
of this, the fact that they must be 2d is another.

Matrices behaving differently on multiplication isn’t super different in my
mind to how string arrays fail to multiply at all.

Eric

On Mon, 29 Oct 2018 at 20:54 Ralf Gommers  wrote:

On Mon, Oct 29, 2018 at 4:31 PM Chris Barker  wrote:
>
>> On Fri, Oct 26, 2018 at 7:12 PM, Travis Oliphant 
>> wrote:
>>
>>
>>>  agree that we can stop bashing subclasses in general.   The problem
>>> with numpy subclasses is that they were made without adherence to SOLID:
>>> https://en.wikipedia.org/wiki/SOLID.  In particular the Liskov
>>> substitution principle:
>>> https://en.wikipedia.org/wiki/Liskov_substitution_principle .
>>>
>>
>> ...
>>
>>
>>> did not properly apply them in creating np.matrix which clearly violates
>>> the substitution principle.
>>>
>>
>> So -- could a matrix subclass be made "properly"? or is that an example
>> of something that should not have been a subclass?
>>
>
> The latter - changing the behavior of multiplication breaks the principle.
>
> Ralf
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
​
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray

2018-10-26 Thread Eric Wieser
in order to be used prior to calling C or Fortran code that expected at
least a 1-d array

I’d argue that the behavior for these functions should have just been to
raise an error saying “this function does not support 0d arrays”, rather
than silently inserting extra dimensions. As a bonus, that would push the
function developers to add support for 0d. Obviously we can’t make it do
that now, but what we can do is have it emit a warning in those cases.

I think our options are:

   1. Deprecate the entire function
   2. Deprecate and eventually(?) throw an error upon calling the function
   on 0d arrays, with a message like *“in future using ascontiguousarray to
   promote 0d arrays to 1d arrays will not be supported. If promotion is
   intentional, use ascontiguousarray(atleast1d(x)) to silence this warning
   and keep the old behavior, and if not use asarray(x, order='C') to preserve
   0d arrays”*
   3. Deprecate (future-warning) when passed 0d arrays, and eventually skip
   the upcast to 1d.
   If the calling code really needed a 1d array, then it will probably
   fail, which is not really different to 2, but has the advantage that the
   names are less surprising.
   4. Only improve the documentation

My preference would be 3

Eric

On Fri, 26 Oct 2018 at 17:35 Travis Oliphant  wrote:

On Fri, Oct 26, 2018 at 7:14 PM Alex Rogozhnikov 
> wrote:
>
>> > If the desire is to shrink the API of NumPy, I could see that.
>>
>> Very good desire, but my goal was different.
>>
>
>> > For some users this is exactly what is wanted.
>>
>> Maybe so, but I didn't face such example (and nobody mentioned those so
>> far in the discussion).
>> The opposite (according to the issue) happened. Mxnet example is
>> sufficient in my opinion.
>>
>
> I agree that the old motivation of APIs that would make it easy to create
> SciPy is no longer a major motivation for most users and even developers
> and so these reasons would not be very present (as well as why it wasn't
> even mentioned in the documentation).
>
>
>> Simple example:
>> x = np.zeros([])
>> assert(x.flags.c_contiguous)
>> assert(np.ascontiguousarray(x).shape == x.shape)
>>
>> Behavior contradicts to documentation (shape is changed) and to name
>> (flags are saying - it is already c_contiguous)
>>
>> If you insist, that keeping ndmin=1 is important (I am not yet convinced,
>> but I am ready to believe your autority),
>> we can add ndmin=1 to functions' signatures, this way explicitly
>> notifying users about expected dimension.
>>
>
> I understand the lack of being convinced.  This is ultimately a problem of
> 0-d arrays not being fully embraced and accepted by the Numeric community
> originally (which NumPy inherited during the early days).   Is there a way
> to document functions that will be removed on a major version increase
> which don't print warnings on use? I would support this.
>
> I'm a big supporter of making a NumPy 2.0 and have been for several years.
> Now that Python 3 transition has happened, I think we could seriously
> discuss this.  I'm trying to raise funding for maintenance and progress for
> NumPy and SciPy right now via Quansight Labs http://www.quansight.com/labs
> and I hope to be able to help find grants to support the wonderful efforts
> that have been happening for some time.
>
> While I'm thrilled and impressed by the number of amazing devs who have
> kept NumPy and SciPy going in mostly their spare time, it has created
> challenges that we have not had continuous maintenance funding to allow
> continuous paid development so that several people who know about the early
> decisions could not be retained to spend time on helping the transition.
>
> Your bringing the problem of mxnet devs is most appreciated.  I will make
> a documentation PR.
>
> -Travis
>
>
>
>
>> Alex.
>>
>>
>> 27.10.2018, 02:27, "Travis Oliphant" :
>>
>> What is the justification for deprecation exactly?  These functions have
>> been well documented and have had the intended behavior of producing arrays
>> with dimension at least 1 for some time.  Why is it unexpected to produce
>> arrays of at least 1 dimension?  For some users this is exactly what is
>> wanted.  I don't understand the statement that behavior with 0-d arrays is
>> unexpected.
>>
>> If the desire is to shrink the API of NumPy, I could see that.   But, it
>> seems odd to me to remove a much-used function with an established behavior
>> except as part of a wider API-shrinkage effort.
>>
>> 0-d arrays in NumPy are a separate conversation.  At this point, I think
>> it was a mistake not to embrace 0-d arrays in NumPy from day one.  In some
>> sense 0-d arrays *are* scalars at least conceptually and for JIT-producing
>> systems that exist now and will be growing in the future, they can be
>> equivalent to scalars.
>>
>> The array scalars should become how you define what is *in* a NumPy array
>> making them true Python types, rather than Python 1-style "instances" of a
>> single "Dtype" object.  You would the

Re: [Numpy-discussion] asanyarray vs. asarray

2018-10-19 Thread Eric Wieser
Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if
they cause problems perhaps that should be seen as a sign that ndarray
subclassing should be made easier and clearer.

Both maskedarray and quantity seem like something that would make more
sense at the dtype level if our dtype system was easier to extend. It might
be good to compile a list of subclassing applications, and split them into
“this ought to be a dtype” and “this ought to be a different type of
container”.
​

On Fri, 19 Oct 2018 at 18:24 Marten van Kerkwijk 
wrote:

> Hi All,
>
> It seems there are two extreme possibilities for general functions:
> 1. Put `asarray` everywhere. The main benefit that I can see is that even
> if people put in list instead of arrays, one is guaranteed to have shape,
> dtype, etc. But it seems a bit like calling `int` on everything that might
> get used as an index, instead of letting the actual indexing do the proper
> thing and call `__index__`.
> 2. Do not coerce at all, but rather write code assuming something is an
> array already. This will often, but not always, just work for array mimics,
> with coercion done only where necessary (e.g., in lower-lying C code such
> as that of the ufuncs which has a smaller API surface and can be overridden
> more easily).
>
> The current __array_function__ work may well provide us with a way to
> combine both, if we (over time) move the coercion inside
> `ndarray.__array_function__` so that the actual implementation *can* assume
> it deals with pure ndarray - then, when relevant, calling that
> implementation will be what subclasses/duck arrays can happily do (and it
> is up to them to ensure this works).
>
> Of course, the above does not really answer what to do in the meantime.
> But perhaps it helps in thinking of what we are actually aiming for.
>
> One last thing: could we please stop bashing subclasses? One can subclass
> essentially everything in python, often to great advantage. Subclasses such
> as MaskedArray and, yes, Quantity, are widely used, and if they cause
> problems perhaps that should be seen as a sign that ndarray subclassing
> should be made easier and clearer.
>
> All the best,
>
> Marten
>
>
> On Fri, Oct 19, 2018 at 7:02 PM Ralf Gommers 
> wrote:
>
>>
>>
>> On Fri, Oct 19, 2018 at 10:28 PM Ralf Gommers 
>> wrote:
>>
>>>
>>>
>>> On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi 
>>> wrote:
>>>
>>>> Hi!
>>>>
>>>> On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer 
>>>> wrote:
>>>> I don't think it makes much sense to change NumPy's existing usage of
>>>> asarray() to asanyarray() unless we add subok=True arguments (which default
>>>> to False). But this ends up cluttering NumPy's public API, which is also
>>>> undesirable.
>>>>
>>>> Agreed so far.
>>>>
>>>
>>> I'm not sure I agree. "subok" is very unpythonic; the average numpy
>>> library function should work fine for a well-behaved subclass (i.e. most
>>> things out there except np.matrix).
>>>
>>>>
>>>> The preferred way to override NumPy functions going forward should be
>>>> __array_function__.
>>>>
>>>>
>>>> I think we should “soft support” i.e. allow but consider unsupported,
>>>> the case where one of NumPy’s functions is implemented in terms of others
>>>> and “passing through” an array results in the correct behaviour for that
>>>> array.
>>>>
>>>
>>> I don't think we have or want such a concept as "soft support". We
>>> intend to not break anything that now has asanyarray, i.e. it's supported
>>> and ideally we have regression tests for all such functions. For anything
>>> we transition over from asarray to asanyarray, PRs should come with new
>>> tests.
>>>
>>>
>>>>
>>>> On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk <
>>>> m.h.vankerkw...@gmail.com> wrote:
>>>>
>>>>> There are exceptions for `matrix` in quite a few places, and there now
>>>>> is warning for `maxtrix` - it might not be bad to use `asanyarray` and add
>>>>> an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric
>>>>> Wieser to just add the exception to `asanyarray` itself - that way when
>>>>> matrix is truly deprecated, it will be a very easy change.
>>>>>
>>>> I don't quite understand this. Adding exceptions is not deprecati

Re: [Numpy-discussion] ndrange, like range but multidimensiontal

2018-10-11 Thread Eric Wieser
I’m not sure if we ever want the ndrange object to return a full matrix.

np.array(ndrange(...)) should definitely return a full array, because
that’s what the user asked for.

Even if you supply a numpy uint8 to range, it still returns a python int
class.

If we want to design ndrange with the intent of indexing only, then it
should probably always use np.intp, whatever the type of the provided
arguments

Would you like ndrange to return a tuple of uint8 in this case?

Tuples are just one of the four options I listed in a previous message. The
downside of tuples is there’s no easy way to say “take just the first axis
of this range”.
Whatever we pick, the return value should be such that
np.array(ndrange(...))[ind]
== ndrange(...)[idx]
​

On Thu, 11 Oct 2018 at 18:54 Mark Harfouche 
wrote:

> Eric, interesting ideas.
>
> > __getitem__(Tuple[int]) which returns numpy scalars
>
> I'm not sure what you mean. Even if you supply a numpy uint8 to range, it
> still returns a python int class.
> Would you like ndrange to return a tuple of `uint8` in this case?
>
> ```
> In [3]: a =
> iter(range(np.uint8(10)))
>
> In [4]:
> next(a).__class__
> Out[4]: int
>
> In [5]:
> np.uint8(10).__class__
> Out[5]: numpy.uint8
> ```
>
> Ravel seems like a cool way to choose iteration order. In the PR, I
> mentionned that one reason that I removed `'F'` order from the PR was:
> 1. My implementation was not competitive with the `C` order implementation
> in terms of speed (can be fixed)
> 2. I don't know if it something that people really need to iterate over
> collections (annoying to maintain if unused)
>
> Instead, I just showed an example how people could iterate in `F` order
> should they need to.
>
> I'm not sure if we ever want the `ndrange` object to return a full matrix.
> It seems like we would be creating a custom tuple class just for this which
> seems pretty niche.
>
>
> On Thu, Oct 11, 2018 at 10:21 AM Eric Wieser 
> wrote:
>
>> Isn’t that what arange is for?
>>
>> Imagining ourselves in python2 land for now - I’m proposing arange is to
>> range, as ndrange is to xrange
>>
>> I’m not convinced it should return an ndarray
>>
>> I agree - I think it should return a range-like object that:
>>
>>- Is convertible via __array__ if needed
>>- Looks like an ndarray, with:
>>   - a .dtype attribute
>>   - a __getitem__(Tuple[int]) which returns numpy scalars
>>   - .ravel() and .flat for choosing iteration order.
>>
>> On Wed, 10 Oct 2018 at 11:21 Allan Haldane allanhald...@gmail.com
>> <http://mailto:allanhald...@gmail.com> wrote:
>>
>> On 10/10/18 12:34 AM, Eric Wieser wrote:
>>> > One thing that worries me here - in python, |range(...)| in essence
>>> > generates a lazy |list| - so I’d expect |ndrange| to generate a lazy
>>> > |ndarray|. In practice, that means it would be a duck-type defining an
>>> > |__array__| method to evaluate it, and only implement methods already
>>> > present in numpy.
>>>
>>> Isn't that what arange is for?
>>>
>>> It seems like there are two uses of python3's range: 1. creating a 1d
>>> iterable of indices for use in for-loops, and 2. with list(range) can be
>>> used to create a sequence of integers.
>>>
>>> Numpy can extend this in two directions:
>>>  * ndrange returns an iterable of nd indices (for for-loops).
>>>  * arange returns an 1d ndarray of integers instead of a list
>>>
>>> The application of for-loops, which is more niche, doesn't need
>>> ndarray's vectorized properties, so I'm not convinced it should return
>>> an ndarray. It certainly seems simpler not to return an ndarray, due to
>>> the dtype question.
>>>
>>> arange on its own seems to cover the need for a vectorized version of
>>> range.
>>>
>>> Allan
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> ​
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndrange, like range but multidimensiontal

2018-10-11 Thread Eric Wieser
Isn’t that what arange is for?

Imagining ourselves in python2 land for now - I’m proposing arange is to
range, as ndrange is to xrange

I’m not convinced it should return an ndarray

I agree - I think it should return a range-like object that:

   - Is convertible via __array__ if needed
   - Looks like an ndarray, with:
  - a .dtype attribute
  - a __getitem__(Tuple[int]) which returns numpy scalars
  - .ravel() and .flat for choosing iteration order.

On Wed, 10 Oct 2018 at 11:21 Allan Haldane allanhald...@gmail.com
<http://mailto:allanhald...@gmail.com> wrote:

On 10/10/18 12:34 AM, Eric Wieser wrote:
> > One thing that worries me here - in python, |range(...)| in essence
> > generates a lazy |list| - so I’d expect |ndrange| to generate a lazy
> > |ndarray|. In practice, that means it would be a duck-type defining an
> > |__array__| method to evaluate it, and only implement methods already
> > present in numpy.
>
> Isn't that what arange is for?
>
> It seems like there are two uses of python3's range: 1. creating a 1d
> iterable of indices for use in for-loops, and 2. with list(range) can be
> used to create a sequence of integers.
>
> Numpy can extend this in two directions:
>  * ndrange returns an iterable of nd indices (for for-loops).
>  * arange returns an 1d ndarray of integers instead of a list
>
> The application of for-loops, which is more niche, doesn't need
> ndarray's vectorized properties, so I'm not convinced it should return
> an ndarray. It certainly seems simpler not to return an ndarray, due to
> the dtype question.
>
> arange on its own seems to cover the need for a vectorized version of
> range.
>
> Allan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
​
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndrange, like range but multidimensiontal

2018-10-11 Thread Eric Wieser
If you use this as the dtype, you both set and get element as tuples.

Elements are not got as tuples, but they can be explicitly cast

What about it makes tuple coercion awkward?

This explicit cast

>>> dt_ind2d = np.dtype([('i0', np.intp), ('i1', np.intp)])
>>> ind = np.zeros((), dt_ind2d)[0]
>>> ind, type(ind)
((0, 0), )
>>> m[ind]
Traceback (most recent call last):
  File "", line 1, in 
m[inds[0]]
IndexError: only integers, slices (`:`), ellipsis (`...`),
numpy.newaxis (`None`) and integer or boolean arrays are valid indices
>>> m[tuple(ind)]
1.0

On Wed, 10 Oct 2018 at 09:08 Stephan Hoyer sho...@gmail.com
<http://mailto:sho...@gmail.com> wrote:

On Tue, Oct 9, 2018 at 9:34 PM Eric Wieser 
> wrote:
>
>> One thing that worries me here - in python, range(...) in essence
>> generates a lazy list - so I’d expect ndrange to generate a lazy ndarray.
>> In practice, that means it would be a duck-type defining an __array__
>> method to evaluate it, and only implement methods already present in numpy.
>>
>> It’s not clear to me what the datatype of such an array-like would be.
>> Candidates I can think of are:
>>
>>1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a
>>little awkward
>>
>> I think this would be the appropriate choice. What about it makes tuple
> coercion awkward? If you use this as the dtype, you both set and get
> element as tuples.
>
> In particular, I would say that ndrange() should be a lazy equivalent to
> the following explicit constructor:
>
> def ndrange(shape):
> dtype = [('i' + str(i), np.intp) for i in range(len(shape))]
> array = np.empty(shape, dtype)
> for indices in np.ndindex(*shape):
> array[indices] = indices
> return array
>
> >>> ndrange((2,)
> array([(0,), (1,)], dtype=[('i0', '
> >>> ndrange((2, 3))
> array([[(0, 0), (0, 1), (0, 2)], [(1, 0), (1, 1), (1, 2)]], dtype=[('i0',
> '
> The one deviation in behavior would be that ndrange() iterates over
> flattened elements rather than the first axes.
>
> It is indeed a little awkward to have field names, but given that NumPy
> creates those automatically when you supply a dtype like 'i8,i8' this is
> probably a reasonable choice.
>
>
>>1. (intp, (N,)) - which collapses into a shape + (3,) array
>>2. object_.
>>3. Some new np.tuple_ dtype, a heterogenous tuple, which is like the
>>structured np.void but without field names. I’m not sure how
>>vectorized element indexing would be spelt though.
>>
>> Eric
>> ​
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
​
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndrange, like range but multidimensiontal

2018-10-09 Thread Eric Wieser
One thing that worries me here - in python, range(...) in essence generates
a lazy list - so I’d expect ndrange to generate a lazy ndarray. In
practice, that means it would be a duck-type defining an __array__ method
to evaluate it, and only implement methods already present in numpy.

It’s not clear to me what the datatype of such an array-like would be.
Candidates I can think of are:

   1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a
   little awkward
   2. (intp, (N,)) - which collapses into a shape + (3,) array
   3. object_.
   4. Some new np.tuple_ dtype, a heterogenous tuple, which is like the
   structured np.void but without field names. I’m not sure how vectorized
   element indexing would be spelt though.

Eric
​

On Tue, 9 Oct 2018 at 21:59 Stephan Hoyer  wrote:

> The speed difference is interesting but really a different question than
> the public API.
>
> I'm coming around to ndrange(). I can see how it could be useful for
> symbolic manipulation of arrays and indexing operations, similar to what we
> do in dask and xarray.
>
> On Mon, Oct 8, 2018 at 4:25 PM Mark Harfouche 
> wrote:
>
>> since ndrange is a superset of the features of ndindex, we can implement
>> ndindex with ndrange or keep it as is.
>> ndindex is now a glorified `nditer` object anyway. So it isn't so much of
>> a maintenance burden.
>> As for how ndindex is implemented, I'm a little worried about python 2
>> performance seeing as range is a list.
>> I would wait on changing the way ndindex is implemented for now.
>>
>> I agree with Stephan that ndindex should be kept in. Many want backward
>> compatible code. It would be hard for me to justify why a dependency should
>> be bumped up to bleeding edge numpy just for a convenience iterator.
>>
>> Honestly, I was really surprised to see such a speed difference, I
>> thought it would have been closer.
>>
>> Allan, I decided to run a few more benchmarks, the nditer just seems slow
>> for single array access some reason. Maybe a bug?
>>
>> ```
>> import numpy as np
>> import itertools
>> a = np.ones((1000, 1000))
>>
>> b = {}
>> for i in np.ndindex(a.shape):
>> b[i] = i
>>
>> %%timeit
>> # op_flag=('readonly',) doesn't change performance
>> for a_value in np.nditer(a):
>> pass
>> 109 ms ± 921 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>
>> %%timeit
>> for i in itertools.product(range(1000), range(1000)):
>> a_value = a[i]
>> 113 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>
>> %%timeit
>> for i in itertools.product(range(1000), range(1000)):
>> c = b[i]
>> 193 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>
>> %%timeit
>> for a_value in a.flat:
>> pass
>> 25.3 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>
>> %%timeit
>> for k, v in b.items():
>> pass
>> 19.9 ms ± 675 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>
>> %%timeit
>> for i in itertools.product(range(1000), range(1000)):
>> pass
>> 28 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>> ```
>>
>> On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer  wrote:
>>
>>> I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e.,
>>> discouraging its use in our docs, but not actually deprecating it).
>>> Certainly ndrange seems like a small but meaningful improvement in the
>>> interface.
>>>
>>> That said, I'm not convinced this is really worth the trouble. I think
>>> the nested loop is still pretty readable/clear, and there are few times
>>> when I've actually found ndindex() be useful.
>>>
>>> On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane 
>>> wrote:
>>>
 On 10/8/18 12:21 PM, Mark Harfouche wrote:
 > 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like
 > `range`, is not an iterator. Changing this behaviour would likely lead
 > to breaking code that uses that assumption. For example anybody using
 > introspection or code like:
 >
 > ```
 > indx = np.ndindex(5, 5)
 > next(indx)  # Don't look at the (0, 0) coordinate
 > for i in indx:
 > print(i)
 > ```
 > would break if `ndindex` becomes "not an iterator"

 OK, I see now. Just like python3 has separate range and range_iterator
 types, where range is sliceable, we would have separate ndrange and
 ndindex types, where ndrange is sliceable. You're just copying the
 python3 api. That justifies it pretty well for me.

 I still think we shouldn't have two functions which do nearly the same
 thing. We should only have one, and get rid of the other. I see two ways
 forward:

  * replace ndindex by your ndrange code, so it is no longer an iter.
This would require some deprecation cycles for the cases that break.
  * deprecate ndindex in favor of a new function ndrange. We would keep
ndindex around for back-compatibility, with a dep warning to use
ndrange instead.

 Doing a code search on 

Re: [Numpy-discussion] Adding a hex version like PY_VERSION_HEX

2018-10-09 Thread Eric Wieser
+1 on Ralf's suggestion. I'm not sure there's any case where the C code
should be using a hex version number - either it's using the C api, in
which case it should just be looking at the C api version - or it's calling
back into the python API, in which case it's probably not unreasonable to
ask it to inspect `np.__version__` / a hypothetical `sys.version_info`,
since it's already going through awkwardness to invoke pure-python APIs..

Eric

On Wed, 10 Oct 2018 at 04:23 Ralf Gommers  wrote:

> On Sat, Oct 6, 2018 at 11:24 PM Matti Picus  wrote:
>
>> On 05/10/18 11:46, Jerome Kieffer wrote:
>> > On Fri, 5 Oct 2018 11:31:20 +0300
>> > Matti Picus  wrote:
>> >
>> >> In PR 12074 https://github.com/numpy/numpy/pull/12074 I propose
>> adding a
>> >> function `version.get_numpy_version_as_hex()` which returns a hex value
>> >> to represent the current NumPy version MAJOR.MINOR.MICRO where
>> >>
>> >> v = hex(MAJOR << 24 | MINOR << 16 | MICRO)
>> > +1
>> >
>> > We use it in our code and it is a good practice, much better then
>> 0.9.0>0.10.0 !
>> >
>> > We added some support for dev, alpha, beta, RC and final versions in
>> > https://github.com/silx-kit/silx/blob/master/version.py
>> >
>> > Cheers,
>> Thanks. I think at this point I will change the proposal to
>>
>> v = hex(MAJOR << 24 | MINOR << 16 | MICRO << 8)
>>
>> which leaves room for future enhancement with "release level" and
>> "serial" as the lower bits.
>>
>
> Makes sense, but to me adding a tuple (like sys.version_info) would be
> more logical. Do that as well or instead of?
>
> Cheers,
> Ralf
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] C99

2018-09-07 Thread Eric Wieser
Thanks for the first step on this!

Should we allow // style comments

I don’t think it matters too much. I think it might be a little messy to
have a mix of the two styles where // means “post py3” and /* */ means
pre-py3 - but at the same time, I do slightly prefer the C++-style. For C
contributors coming from python, I’d expect that it feels more natural to
only have to put a comment marker at the start of the line. We could
convert the /**/-style to //-style with a tool, but it’s probably not worth
the churn or time.

Should we allow variable declarations after code

I’d be very strongly in favor of this - it makes it much easier to extract
helper functions if variables are declared as late as they can be - plus it
make it easier to reason about early returns not needing goto fail.

Related to this feature, I think allowing for(int i = 0; i < N; i++) is a
clear win.

Eric

On Fri, 7 Sep 2018 at 18:56 Charles R Harris charlesr.har...@gmail.com
 wrote:

Hi All,
>
> I've a PR up converting travis testing to use C99
> . I suspect we may not want to
> merge it for a while, but it does raise a couple of style questions that we
> should probably settle up front. Namely:
>
>
>- Should we allow // style comments
>- Should we allow variable declarations after code
>
> I am sure there are others to consider that haven't occurred to me. I
> confess that I am not a big fan of allowing either, but am probably
> prejudiced by early familiarity with C89 and long years working to that
> spec.
>
> Thoughts?
>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
​
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Possible Deprecation of np.ediff1d

2018-08-27 Thread Eric Wieser
There is already a patch to add such a feature to np.diff at
https://github.com/numpy/numpy/pull/8206
​

On Mon, 27 Aug 2018 at 10:47 Charles R Harris 
wrote:

> On Mon, Aug 27, 2018 at 11:37 AM Robert Kern 
> wrote:
>
>> On Mon, Aug 27, 2018 at 10:30 AM Tyler Reddy 
>> wrote:
>>
>>> Chuck suggested (
>>> https://github.com/numpy/numpy/pull/11805#issuecomment-416069436 ) that
>>> we may want to consider deprecating np.ediff1d, which is perhaps not much
>>> more useful than np.diff, apart from having some arguably strange prepend /
>>> append behavior added in.
>>>
>>> Related discussion on SO:
>>> https://stackoverflow.com/questions/39014324/difference-between-numpy-ediff1d-and-diff
>>>
>>> Thoughts?
>>>
>>
>> Huh. Never knew this existed. I'd say about 50% of the time I use
>> np.diff(), I'm doing that prepend/append behavior manually (and less
>> readably, possibly inefficiently, but most importantly annoyingly).
>>
>
> I was thinking we might want to add something to `np.diff`, maybe using
> `np.pad`.
>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] pytest, fixture and parametrize

2018-08-08 Thread Eric Wieser
Is another nice feature

You forget that we already have that feature in our testing layer,

with np.testing.assert_raises(AnException):
pass

​

On Wed, 8 Aug 2018 at 09:08 Chris Barker - NOAA Federal <
chris.bar...@noaa.gov> wrote:

> BTW:
>
> with pytest.raises(AnException):
> 
>
> Is another nice feature.
>
> -CHB
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Polynomial evaluation inconsistencies

2018-07-01 Thread Eric Wieser
I think the `x` is just noise there, especially if it's ignored (that is,
`T[0](x*2)` doesn't do anything reasonable).

Chebyshev.literal(lambda T: 1*T[0] + 2*T[1] + 3*T[2])

Would work, but honestly I don't think that provides much clarity. I think
the value here is mainly for "simple" polynomials.

On Sun, 1 Jul 2018 at 23:42 Maxwell Aifer  wrote:

> Say we add a constructor to the polynomial base class that looks something
> like this:
>
>
> ---
>@classmethod
> def literal(cls, f):
> def basis_function_getter(self, deg):
> coefs = [0]*deg + [1]
> return lambda _: cls(coefs)
> basis = type('',(object,),{'__getitem__': basis_function_getter})()
> return f(basis, None)
>
> ---
>
>
> Then the repr for, say, a Chebyshev polynomial could look like this:
>
> >>> Chebyshev.literal(lambda T,x: 1*T[0](x) + 2*T[1](x) + 3*T[2](x))
>
> Does this sound like a good idea to anyone?
>
> Max
>
>
> On Sat, Jun 30, 2018 at 6:47 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Sat, Jun 30, 2018 at 4:42 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>>
>>>
>>> On Sat, Jun 30, 2018 at 3:41 PM, Eric Wieser <
>>> wieser.eric+nu...@gmail.com> wrote:
>>>
>>>> Since the one of the arguments for the decreasing order seems to just
>>>> be textual representation - do we want to tweak the repr to something like
>>>>
>>>> Polynomial(lambda x: 2*x**3 + 3*x**2 + x + 0)
>>>>
>>>> (And add a constructor that calls the lambda with Polynomial(1))
>>>>
>>>> Eric
>>>>
>>>
>>> IIRC there was a proposal for that. There is the possibility of adding
>>> renderers for latex and html that could be used by Jupyter, and I think the
>>> ordering was an option.
>>>
>>
>> See https://github.com/numpy/numpy/issues/8893 for the proposal. BTW, if
>> someone would like to work on this, go for it.
>>
>> Chuck
>>
>>> ​
>>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Polynomial evaluation inconsistencies

2018-06-30 Thread Eric Wieser
Good catch, it would do that

On Sat, 30 Jun 2018 at 15:07 Maxwell Aifer  wrote:

> *shouldn't the constructor call the lambda with Polynomial([0,1[)
>
> On Sat, Jun 30, 2018 at 6:05 PM, Maxwell Aifer 
> wrote:
>
>> Oh, clever... yeah I think that would be very cool. But shouldn't it call
>> the constructor with Polynomial([0,1])?
>>
>> On Sat, Jun 30, 2018 at 5:41 PM, Eric Wieser > > wrote:
>>
>>> Since the one of the arguments for the decreasing order seems to just be
>>> textual representation - do we want to tweak the repr to something like
>>>
>>> Polynomial(lambda x: 2*x**3 + 3*x**2 + x + 0)
>>>
>>> (And add a constructor that calls the lambda with Polynomial(1))
>>>
>>> Eric
>>> ​
>>>
>>> On Sat, 30 Jun 2018 at 14:30 Eric Wieser 
>>> wrote:
>>>
>>>> “the intuitive way” is the decreasing powers.
>>>>
>>>> An argument against this is that accessing the ith power of x is spelt:
>>>>
>>>>- x.coeffs[i] for increasing powers
>>>>- x.coeffs[-i-1] for decreasing powers
>>>>
>>>> The former is far more natural than the latter, and avoids a potential
>>>> off-by-one error
>>>>
>>>> If I ask someone to write down the coefficients of a polynomial I don’t
>>>> think anyone would start from c[2]
>>>>
>>>> You wouldn’t? I’d expect to see
>>>>
>>>> [image: f(x) = a_3x^3 + a_2x^2 + a_1x + a_0]
>>>>
>>>> rather than
>>>>
>>>> [image: f(x) = a_0x^3 + a_1x^2 + a_2x + a_3]
>>>>
>>>> Sure, I’d write it starting with the highest power, but I’d still
>>>> number my coefficients to match the powers.
>>>>
>>>>
>>>> Eric
>>>> ​
>>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Polynomial evaluation inconsistencies

2018-06-30 Thread Eric Wieser
Since the one of the arguments for the decreasing order seems to just be
textual representation - do we want to tweak the repr to something like

Polynomial(lambda x: 2*x**3 + 3*x**2 + x + 0)

(And add a constructor that calls the lambda with Polynomial(1))

Eric
​

On Sat, 30 Jun 2018 at 14:30 Eric Wieser 
wrote:

> “the intuitive way” is the decreasing powers.
>
> An argument against this is that accessing the ith power of x is spelt:
>
>- x.coeffs[i] for increasing powers
>- x.coeffs[-i-1] for decreasing powers
>
> The former is far more natural than the latter, and avoids a potential
> off-by-one error
>
> If I ask someone to write down the coefficients of a polynomial I don’t
> think anyone would start from c[2]
>
> You wouldn’t? I’d expect to see
>
> [image: f(x) = a_3x^3 + a_2x^2 + a_1x + a_0]
>
> rather than
>
> [image: f(x) = a_0x^3 + a_1x^2 + a_2x + a_3]
>
> Sure, I’d write it starting with the highest power, but I’d still number
> my coefficients to match the powers.
>
>
> Eric
> ​
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Polynomial evaluation inconsistencies

2018-06-30 Thread Eric Wieser
“the intuitive way” is the decreasing powers.

An argument against this is that accessing the ith power of x is spelt:

   - x.coeffs[i] for increasing powers
   - x.coeffs[-i-1] for decreasing powers

The former is far more natural than the latter, and avoids a potential
off-by-one error

If I ask someone to write down the coefficients of a polynomial I don’t
think anyone would start from c[2]

You wouldn’t? I’d expect to see

[image: f(x) = a_3x^3 + a_2x^2 + a_1x + a_0]

rather than

[image: f(x) = a_0x^3 + a_1x^2 + a_2x + a_3]

Sure, I’d write it starting with the highest power, but I’d still number my
coefficients to match the powers.


Eric
​
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Polynomial evaluation inconsistencies

2018-06-30 Thread Eric Wieser
>  if a single program uses both np.polyval() and np.polynomail.Polynomial,
it seems bound to cause unnecessary confusion.

Yes, I would recommend definitely not doing that!

> I still think it would make more sense for np.polyval() to use
conventional indexing

Unfortunately, it's too late for "making sense" to factor into the design.
`polyval` is being used in the wild, so we're stuck with it behaving the
way it does. At best, we can deprecate it and start telling people to move
from `np.polyval` over to `np.polynomial.polynomial.polyval`. Perhaps we
need to make this namespace less cumbersome in order for that to be a
reasonable option.

I also wonder if we want a more lightweight polynomial object without the
extra domain and range information, which seem like they make `Polynomial`
a more questionable drop-in replacement for `poly1d`.

Eric

On Sat, 30 Jun 2018 at 09:14 Maxwell Aifer  wrote:

> Thanks, that explains a lot! I didn't realize the reverse ordering
> actually originated with matlab's polyval, but that makes sense given the
> one-based indexing. I see why it is the way it is, but I still think it
> would make more sense for np.polyval() to use conventional indexing (c[0]
> * x^0 + c[1] * x^1 + c[2] * x^2). np.polyval() can be convenient when a
> polynomial object is just not needed, but if a single program uses both
> np.polyval() and np.polynomail.Polynomial, it seems bound to cause
> unnecessary confusion.
>
> Max
>
> On Fri, Jun 29, 2018 at 11:23 PM, Eric Wieser  > wrote:
>
>> Here's my take on this, but it may not be an accurate summary of the
>> history.
>>
>> `np.poly` is part of the original matlab-style API, built around
>> `poly1d` objects. This isn't a great design, because they represent:
>>
>> p(x) = c[0] * x^2 + c[1] * x^1 + c[2] * x^0
>>
>> For this reason, among others, the `np.polynomial` module was created,
>> starting with a clean slate. The core of this is
>> `np.polynomial.Polynomial`. There, everything uses the convention
>>
>> p(x) = c[0] * x^0 + c[1] * x^1 + c[2] * x^2
>>
>> It sounds like we might need clearer docs explaining the difference, and
>> pointing users to the more sensible `np.polynomial.Polynomial`
>>
>> Eric
>>
>>
>>
>> On Fri, 29 Jun 2018 at 20:10 Charles R Harris 
>> wrote:
>>
>>> On Fri, Jun 29, 2018 at 8:21 PM, Maxwell Aifer 
>>> wrote:
>>>
>>>> Hi,
>>>> I noticed some frustrating inconsistencies in the various ways to
>>>> evaluate polynomials using numpy. Numpy has three ways of evaluating
>>>> polynomials (that I know of) and each of them has a different syntax:
>>>>
>>>>-
>>>>
>>>>numpy.polynomial.polynomial.Polynomial
>>>>
>>>> <https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polynomial.polynomial.Polynomial.html#numpy.polynomial.polynomial.Polynomial>:
>>>>You define a polynomial by a list of coefficients *in order of
>>>>increasing degree*, and then use the class’s call() function.
>>>>-
>>>>
>>>>np.polyval
>>>>
>>>> <https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.polyval.html>:
>>>>Evaluates a polynomial at a point. *First* argument is the
>>>>polynomial, or list of coefficients *in order of decreasing degree*,
>>>>and the *second* argument is the point to evaluate at.
>>>>-
>>>>
>>>>np.polynomial.polynomial.polyval
>>>>
>>>> <https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.polynomial.polynomial.polyval.html>:
>>>>Also evaluates a polynomial at a point, but has more support for
>>>>vectorization. *First* argument is the point to evaluate at, and
>>>>*second* argument the list of coefficients *in order of increasing
>>>>degree*.
>>>>
>>>> Not only the order of arguments is changed between different methods,
>>>> but the order of the coefficients is reversed as well, leading to puzzling
>>>> bugs (in my experience). What could be the reason for this madness? As
>>>> polyval is a shameless ripoff of Matlab’s function of the same name
>>>> <https://www.mathworks.com/help/matlab/ref/polyval.html> anyway, why
>>>> not just use matlab’s syntax (polyval([c0, c1, c2...], x)) across the
>>>> board?
>>>> ​
>>>>
>>>>
>>> The polynomial package, with its various ba

Re: [Numpy-discussion] Polynomial evaluation inconsistencies

2018-06-29 Thread Eric Wieser
Here's my take on this, but it may not be an accurate summary of the
history.

`np.poly` is part of the original matlab-style API, built around
`poly1d` objects. This isn't a great design, because they represent:

p(x) = c[0] * x^2 + c[1] * x^1 + c[2] * x^0

For this reason, among others, the `np.polynomial` module was created,
starting with a clean slate. The core of this is
`np.polynomial.Polynomial`. There, everything uses the convention

p(x) = c[0] * x^0 + c[1] * x^1 + c[2] * x^2

It sounds like we might need clearer docs explaining the difference, and
pointing users to the more sensible `np.polynomial.Polynomial`

Eric



On Fri, 29 Jun 2018 at 20:10 Charles R Harris 
wrote:

> On Fri, Jun 29, 2018 at 8:21 PM, Maxwell Aifer 
> wrote:
>
>> Hi,
>> I noticed some frustrating inconsistencies in the various ways to
>> evaluate polynomials using numpy. Numpy has three ways of evaluating
>> polynomials (that I know of) and each of them has a different syntax:
>>
>>-
>>
>>numpy.polynomial.polynomial.Polynomial
>>
>> :
>>You define a polynomial by a list of coefficients *in order of
>>increasing degree*, and then use the class’s call() function.
>>-
>>
>>np.polyval
>>
>> :
>>Evaluates a polynomial at a point. *First* argument is the
>>polynomial, or list of coefficients *in order of decreasing degree*,
>>and the *second* argument is the point to evaluate at.
>>-
>>
>>np.polynomial.polynomial.polyval
>>
>> :
>>Also evaluates a polynomial at a point, but has more support for
>>vectorization. *First* argument is the point to evaluate at, and
>>*second* argument the list of coefficients *in order of increasing
>>degree*.
>>
>> Not only the order of arguments is changed between different methods, but
>> the order of the coefficients is reversed as well, leading to puzzling bugs
>> (in my experience). What could be the reason for this madness? As polyval
>> is a shameless ripoff of Matlab’s function of the same name
>>  anyway, why not
>> just use matlab’s syntax (polyval([c0, c1, c2...], x)) across the board?
>> ​
>>
>>
> The polynomial package, with its various basis, deals with series, and
> especially with the truncated series approximations that are used in
> numerical work. Series are universally written in increasing order of the
> degree. The Polynomial class is efficient in a single variable, while the
> numpy.polynomial.polynomial.polyval function is intended as a building
> block and can also deal with multivariate polynomials or multidimensional
> arrays of polynomials, or a mix. See the simple implementation of polyval3d
> for an example. If you are just dealing with a single variable, use
> Polynomial, which will also track scaling and offsets for numerical
> stability and is generally much superior to the simple polyval function
> from a numerical point of view.
>
> As to the ordering of the degrees, learning that the degree matches the
> index is pretty easy and is a more natural fit for the implementation code,
> especially as the number of variables increases. I note that Matlab has
> ones based indexing, so that was really not an option for them.
>
> Chuck
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Revised NEP-18, __array_function__ protocol

2018-06-29 Thread Eric Wieser
Good catch,

I think the latter failing is because np.add.reduce ends up calling
np.ufunc.reduce.__get__(np.add), and builtin_function.__get__ doesn’t
appear to do any caching. I suppose caching bound methods would just be a
waste of time.
== would work just fine in my suggestion above, it seems - irrespective of
the resolution of the discussion on python-dev.

Eric
​

On Fri, 29 Jun 2018 at 18:24 Stephan Hoyer  wrote:

> On Thu, Jun 28, 2018 at 5:36 PM Eric Wieser 
> wrote:
>
>> Another option would be to directly compare the methods against known
>> ones:
>>
>> obj = func.__self__
>> if isinstance(obj, np.ufunc):
>> if func is obj.reduce:
>> got_reduction()
>>
>> I'm not quite sure why, but this doesn't seem to work with current ufunc
> objects:
>
> >>> np.add.reduce == np.add.reduce  # OK
> True
>
> >>> np.add.reduce is np.add.reduce  # what?!?
> False
>
> Maybe this is a bug? There's been some somewhat related discussion
> recently on python-dev:
> https://mail.python.org/pipermail/python-dev/2018-June/153959.html
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Revised NEP-18, __array_function__ protocol

2018-06-28 Thread Eric Wieser
Another option would be to directly compare the methods against known ones:

obj = func.__self__
if isinstance(obj, np.ufunc):
if func is obj.reduce:
got_reduction()

Eric
​

On Thu, 28 Jun 2018 at 17:19 Stephan Hoyer  wrote:

> On Thu, Jun 28, 2018 at 1:12 PM Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> For C classes like the ufuncs, it seems `__self__` is defined for methods
>> as well (at least, `np.add.reduce.__self__` gives `np.add`), but not a
>> `__func__`. There is a `__name__` (="reduce"), though, which means that I
>> think one can still retrieve what is needed (obviously, this also means
>> `__array_ufunc__` could have been simpler...)
>>
>
> Good point!
>
> I guess this means we should encourage using __name__ rather than
> __func__. I would not want to preclude refactoring classes from Python to
> C/Cython.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 21: Simplified and explicit advanced indexing

2018-06-26 Thread Eric Wieser
We can expose some of the internals

These could be expressed as methods on the internal indexing objects I
proposed in the first reply to this thread, which has seen no responses.

I think Hameer Abbasi is looking for something like
OrthogonalIndexer(...).to_vindex()
-> VectorizedIndexer such that arr.oindex[ind] selects the same elements as
arr.vindex[OrthogonalIndexer(ind).to_vindex()]

Eric
​

On Tue, 26 Jun 2018 at 08:04 Sebastian Berg 
wrote:

> On Tue, 2018-06-26 at 04:01 -0400, Hameer Abbasi wrote:
> > I second this design. If we were to consider the general case of a
> > tuple `idx`, then we’d not be moving forward at all. Design changes
> > would be impossible. I’d argue that this newer model would be easier
> > for library maintainers overall (who are the kind of people using
> > this), reducing maintenance cost in the long run because it’d lead to
> > simpler code.
> >
> > I would also that the “internal” classes expressing outer as
> > vectorised indexing etc. should be exposed, for maintainers of duck
> > arrays to use. God knows how many utility functions I’ve had to write
> > to avoid relying on undocumented NumPy internals for pydata/sparse,
> > fearing that I’d have to rewrite/modify them when behaviour changes
> > or I find other corner cases.
>
> Could you list some examples what you would need? We can expose some of
> the internals, or maybe even provide funcs to map e.g. oindex to vindex
> or vindex to plain indexing, etc. but it would be helpful to know what
> downstream actually might need. For all I know the things that you are
> thinking of may not even exist...
>
> - Sebastian
>
>
>
> >
> > Best Regards,
> > Hameer Abbasi
> > Sent from Astro for Mac
> >
> > > On 26. Jun 2018 at 09:46, Robert Kern 
> > > wrote:
> > >
> > > On Tue, Jun 26, 2018 at 12:13 AM Eric Wieser  > > il.com> wrote:
> > > > > I don't think it should be relegated to the "officially
> > > > discouraged" ghetto of `.legacy_index`
> > > >
> > > > The way I read it, the new spelling lof that would be the
> > > > explicit but not discouraged `image.vindex[rr, cc]`.
> > > >
> > >
> > > Okay, I missed that the first time through. I think having more
> > > self-contained descriptions of the semantics of each of these would
> > > be a good idea. The current description of `.vindex` spends more
> > > time talking about what it doesn't do, compared to the other
> > > methods, than what it does.
> > >
> > > Some more typical, less-exotic examples would be a good idea.
> > >
> > > > > I would reserve warnings for the cases where the current
> > > > behavior is something no one really wants, like mixing slices and
> > > > integer arrays.
> > > >
> > > > These are the cases that would only be available under
> > > > `legacy_index`.
> > > >
> > >
> > > I'm still leaning towards not warning on current, unproblematic
> > > common uses. It's unnecessary churn for currently working,
> > > understandable code. I would still reserve warnings and deprecation
> > > for the cases where the current behavior gives us something that no
> > > one wants. Those are the real traps that people need to be warned
> > > away from.
> > >
> > > If someone is mixing slices and integer indices, that's a really
> > > good sign that they thought indexing behaved in a different way
> > > (e.g. orthogonal indexing).
> > >
> > > If someone is just using multiple index arrays that would currently
> > > not give an error, that's actually a really good sign that they are
> > > using it correctly and are getting the semantics that they desired.
> > > If they wanted orthogonal indexing, it is *really* likely that
> > > their index arrays would *not* broadcast together. And even if they
> > > did, the wrong shape of the result is one of the more easily
> > > noticed things. These are not silent errors that would motivate
> > > adding a new warning.
> > >
> > > --
> > > Robert Kern
> > >
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 21: Simplified and explicit advanced indexing

2018-06-26 Thread Eric Wieser
Another thing I’d say is arr.?index should be replaced with arr.?idx.

Or perhaps arr.o_[] and arr.v_[], to match the style of our existing
np.r_, np.c_, np.s_, etc?
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 21: Simplified and explicit advanced indexing

2018-06-26 Thread Eric Wieser
> I don't think it should be relegated to the "officially discouraged"
ghetto of `.legacy_index`

The way I read it, the new spelling lof that would be the explicit but not
discouraged `image.vindex[rr, cc]`.

> I would reserve warnings for the cases where the current behavior is
something no one really wants, like mixing slices and integer arrays.

These are the cases that would only be available under `legacy_index`.

Eric

On Mon, 25 Jun 2018 at 23:54 Robert Kern  wrote:

> On Mon, Jun 25, 2018 at 11:29 PM Andrew Nelson  wrote:
>
>> On Tue, 26 Jun 2018 at 16:24, Juan Nunez-Iglesias 
>> wrote:
>>
>>> > Plain indexing arr[...] should return an error for ambiguous cases.
>>> [...] This includes every use of vectorized indexing with multiple integer
>>> arrays.
>>>
>>> This line concerns me. In scikit-image, we often do:
>>>
>>> rr, cc = coords.T  # coords is an (n, 2) array of integer coordinates
>>> values = image[rr, cc]
>>>
>>> Are you saying that this use is deprecated? Because we love it at
>>> scikit-image. I would be very very very sad to lose this syntax.
>>>
>>
>>  I second Juan's sentiments wholeheartedly here.
>>
>
> And thirded. This should not be considered deprecated or discouraged. As I
> mentioned in the previous iteration of this discussion, this is the
> behavior I want more often than the orthogonal indexing. It's a really
> common way to work with images and other kinds of raster data, so I don't
> think it should be relegated to the "officially discouraged" ghetto of
> `.legacy_index`. It should not issue warnings or (eventual) errors. I would
> reserve warnings for the cases where the current behavior is something no
> one really wants, like mixing slices and integer arrays.
>
>
> --
> Robert Kern
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 21: Simplified and explicit advanced indexing

2018-06-25 Thread Eric Wieser
Generally +1 on this, but I don’t think we need

To ensure that existing subclasses of ndarray that override indexing
do not inadvertently revert to default behavior for indexing attributes,
these attribute should have explicit checks that disable them if
__getitem__ or __setitem__ has been overriden.

Repeating my proposal from github, I think we should introduce some
internal indexing objects - something simple like:

# np.core.*class Indexer(object):  # importantly not iterable
def __init__(self, value):
self.value = valueclass OrthogonalIndexer(Indexer): passclass
VectorizedIndexer(Indexer): pass

Keeping the proposed syntax, we’d implement:

   - arr.oindex[ind] as arr[np.core.OrthogonalIndexer(ind)]
   - arr.vindex[ind] as arr[np.core.VectorizedIndexer(ind)]

This means that subclasses like the following

class LoggingIndexer(np.ndarray):
def __getitem__(self, ind):
ret = super().__getitem__(ind)
print("Got an index")
return ret

will continue to work without issues. This includes np.ma.MaskedArray and
np.memmap, so this already has value internally.

For classes like np.matrix which inspect the index object itself, an error
will still be raised from __getitem__, since it looks nothing like the
values normally passed - most likely of the form

TypeError: 'numpy.core.VectorizedIndexer' object does not support indexing
TypeError: 'numpy.core.VectorizedIndexer' object is not iterable

This could potentially be caught in oindex.__getitem__ and converted into a
more useful error message.

So to summarize the benefits of the above tweaks:

   - Pass-through subclasses get the new behavior for free
   - No additional descriptor helpers are needed to let non-passthrough
   subclasses implement the new indexable attributes - only a change to
   __getitem__ is needed

And the costs:

   - A less clear error message when new indexing is used on old types (can
   chain with a more useful exception on python 3)
   - Class construction overhead for indexing via the attributes (skippable
   for base ndarray if significant)

Eric
​

On Mon, 25 Jun 2018 at 14:30 Stephan Hoyer  wrote:

> Sebastian and I have revised a Numpy Enhancement Proposal that he started
> three years ago for overhauling NumPy's advanced indexing. We'd now like to
> present it for official consideration.
>
> Minor inline comments (e.g., typos) can be added to the latest pull
> request (https://github.com/numpy/numpy/pull/11414/files), but otherwise
> let's keep discussion on the mailing list. The NumPy website should update
> shortly with a rendered version (
> http://www.numpy.org/neps/nep-0021-advanced-indexing.html), but until
> then please see the full text below.
>
> Cheers,
> Stephan
>
> =
> Simplified and explicit advanced indexing
> =
>
> :Author: Sebastian Berg
> :Author: Stephan Hoyer 
> :Status: Draft
> :Type: Standards Track
> :Created: 2015-08-27
>
>
> Abstract
> 
>
> NumPy's "advanced" indexing support for indexing arrays with other arrays
> is
> one of its most powerful and popular features. Unfortunately, the existing
> rules for advanced indexing with multiple array indices are typically
> confusing
> to both new, and in many cases even old, users of NumPy. Here we propose an
> overhaul and simplification of advanced indexing, including two new
> "indexer"
> attributes ``oindex`` and ``vindex`` to facilitate explicit indexing.
>
> Background
> --
>
> Existing indexing operations
> 
>
> NumPy arrays currently support a flexible range of indexing operations:
>
> - "Basic" indexing involving only slices, integers, ``np.newaxis`` and
> ellipsis
>   (``...``), e.g., ``x[0, :3, np.newaxis]`` for selecting the first element
>   from the 0th axis, the first three elements from the 1st axis and
> inserting a
>   new axis of size 1 at the end. Basic indexing always return a view of the
>   indexed array's data.
> - "Advanced" indexing, also called "fancy" indexing, includes all cases
> where
>   arrays are indexed by other arrays. Advanced indexing always makes a
> copy:
>
>   - "Boolean" indexing by boolean arrays, e.g., ``x[x > 0]`` for
> selecting positive elements.
>   - "Vectorized" indexing by one or more integer arrays, e.g., ``x[[0,
> 1]]``
> for selecting the first two elements along the first axis. With
> multiple
> arrays, vectorized indexing uses broadcasting rules to combine indices
> along
> multiple dimensions. This allows for producing a result of arbitrary
> shape
> with arbitrary elements from the original arrays.
>   - "Mixed" indexing involving any combinations of the other advancing
> types.
> This is no more powerful than vectorized indexing, but is sometimes
> more
> convenient.
>
> For clarity, we will refer to these existing rules as "legacy indexing".
> This is only a high-level summary; for more details, see NumPy's
> docum

Re: [Numpy-discussion] Remove sctypeNA and typeNA from numpy core

2018-06-21 Thread Eric Wieser
> I bet the NA is for the missing value support thatnever happened

Nope - NA stands for NumArray

Eric

On Thu, 21 Jun 2018 at 11:07 Sebastian Berg 
wrote:

> On Thu, 2018-06-21 at 09:25 -0700, Matti Picus wrote:
> > numpy.core has many ways to catalogue dtype names: sctypeDict,
> > typeDict
> > (which is precisely sctypeDict), typecodes, and typename. We also
> > generate sctypeNA and typeNA but, as issue 11241 shows, it is
> > sometimes
> > wrong. They are also not documented and never used inside numpy.
> > Instead
> > of fixing it, I propose to remove sctypeNA and typeNA.
> >
>
> Sounds like a good idea, we have too much stuff in there, and this one
> is not even useful (I bet the NA is for the missing value support that
> never happened).
>
> Might be good to do a quick deprecation anyway though, mostly out of
> principle.
>
> - Sebastian
>
> > Any thoughts or objections?
> > Matti
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs

2018-06-12 Thread Eric Wieser
I don’t understand your alternative here. If we overload np.matmul using
*array_function*, then it would not use *ether* of these options for
writing the operation in terms of other gufuncs. It would simply look for
an *array_function* attribute, and call that method instead.

Let me explain that suggestion a little more clearly.

   1. There’d be a linalg.matmul2d that performs the real matrix case,
   which would be easy to make as a ufunc right now.
   2. __matmul__ and __rmatmul__ would just call np.matmul, as they
   currently do (for consistency between np.matmul and operator.matmul,
   needed in python pre-@-operator)
   3. np.matmul would be implemented as:

   @do_array_function_overridesdef matmul(a, b):
   if a.ndim != 1 and b.ndim != 1:
   return matmul2d(a, b)
   elif a.ndim != 1:
   return matmul2d(a, b[:,None])[...,0]
   elif b.ndim != 1:
   return matmul2d(a[None,:], b)
   else:
   # this one probably deserves its own ufunf
   return matmul2d(a[None,:], b[:,None])[0,0]

   4. Quantity can just override __array_ufunc__ as with any other ufunc
   5. DataArray, knowing the above doesn’t work, would implement something
   like

   @matmul.register_array_function(DataArray)def __array_function__(a, b):
   if a.ndim != 1 and b.ndim != 1:
   return matmul2d(a, b)
   else:
   # either:
   # - add/remove dummy dimensions in a dataarray-specific way
   # - downcast to ndarray and do the dimension juggling there


Advantages of this approach:

   -

   Neither the ufunc machinery, nor __array_ufunc__, nor the inner loop,
   need to know about optional dimensions.
   -

   We get a matmul2d ufunc, that all subclasses support out of the box if
   they support matmul

Eric
​
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs

2018-06-11 Thread Eric Wieser
Frozen dimensions:

I started with just making every 3-vector and 3x3-matrix structured arrays
with the relevant single sub-array entry

I was actually suggesting omitting the structured dtype (ie, field names)
altogether, and just using the subarray dtypes (which exist alone, but not
in arrays).

Another (small?) advantage is that I can use `axis

This is a fair argument against my proposal - at any rate, I think we’d
need a better story for subarray dtypes before trying to add support to
them for ufuncs
--

Broadcasting dimensions

But perhaps a putative weighted_mean … is a decent example

That’s fairly convincing as a non-chained ufunc case. Can you add an
example like that to the NEP?

Also, it has the benefit of being clear what the function can handle by
inspection of the signature

Is broadcasting (n),(n)->(),() less clear that (n|1),(n|1)->(),()? Can you
come up with an example where only some dimensions make sense to broadcast?
--

Eric
​

On Mon, 11 Jun 2018 at 08:04 Marten van Kerkwijk 
wrote:

> Nathaniel:
>>
>> Output shape feels very similar to
>> output dtype to me, so maybe the general way to handle this would be
>> to make the first callback take the input shapes+dtypes and return the
>> desired output shapes+dtypes?
>>
>> This hits on an interesting alternative to frozen dimensions - np.cross
>> could just become a regular ufunc with signature np.dtype((float64, 3)),
>> np.dtype((float64, 3)) → np.dtype((float64, 3))
>>
> As you note further down, the present proposal of just using numbers has
> the advantage of being clear and easy. Another (small?) advantage is that I
> can use `axis` to tell where my three coordinates are, rather than be stuck
> with having them as the last dimension.
>
> Indeed, in my trials for wrapping the Standards Of Fundamental Astronomy
> routines, I started with just making every 3-vector and 3x3-matrix
> structured arrays with the relevant single sub-array entry. That worked,
> but I ended up disliking the casting to and fro.
>
>
>> Furthermore, the expansion quickly becomes cumbersome. For instance, for
>> the all_equal signature of (n|1),(n|1)->() …
>>
>> I think this is only a good argument when used in conjunction with the
>> broadcasting syntax. I don’t think it’s a reason for matmul not to have
>> multiple signatures. Having multiple signatures is an disincentive to
>> introduced too many overloads of the same function, which seems like a good
>> thing to me
>>
> But implementation for matmul is actually considerably trickier, since the
> internal loop now has to check the number of distinct dimensions.
>
>
>> Summarizing my overall opinions:
>>
>>- I’m +0.5 on frozen dimensions. The use-cases seem reasonable, and
>>it seems like an easy-ish way to get them. Allowing ufuncs to natively
>>support subarray types might be a tidier solution, but that could come 
>> down
>>the road
>>
>> Indeed, they are not mutually exclusive. My guess would be that the use
> cases would be somewhat different.
>
>
>>
>>- I’m -1 on optional dimensions: they seem to legitimize creating
>>many overloads of gufuncs. I’m already not a fan of how matmul has special
>>cases for lower dimensions that don’t generalize well. To me, the best way
>>to handle matmul would be to use the proposed __array_function__ to
>>handle the shape-based special-case dispatching, either by:
>>   - Inserting dimensions, and calling the true gufunc
>>   np.linalg.matmul_2d (which is a function I’d like direct access to
>>   anyway).
>>   - Dispatching to one of four ufuncs
>>
>> I must admit I wish that `@` was just pure matrix multiplication...  But
> otherwise agree with Stephan as optional dimensions being the least-bad
> solution.
>
> Aside: do agree we should think about how to expose the `linalg` gufuncs.
>
>>
>>- Broadcasting dimensions:
>>   - I know you’re not suggesting this but: enabling broadcasting
>>   unconditionally for all gufuncs would be a bad idea, masking linalg 
>> bugs.
>>   (although einsum does support broadcasting…)
>>
>> Indeed, definitely *not* suggesting that!
>
>
>>
>>-
>>   - Does it really need a per-dimension flag, rather than a global
>>   one? Can you give a case where that’s useful?
>>
>> Mostly simply that the implementation is easier given the optional
> dimensions... Also, it has the benefit of being clear what the function can
> handle by inspection of the signature, i.e., it self-documents better (one
> of my main arguments in favour of frozen dimensions...).
>
>
>>
>>-
>>   - If we’d already made all_equal a gufunc, I’d be +1 on adding
>>   broadcasting support to it
>>   - I’m -0.5 on the all_equal path in the first place. I think we
>>   either should have a more generic approach to combined ufuncs, or just
>>   declare them numbas job.
>>
>> I am working on and off on a way to generically chain ufuncs

Re: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs

2018-06-10 Thread Eric Wieser
Thanks for the writeup Marten,

Nathaniel:

Output shape feels very similar to
output dtype to me, so maybe the general way to handle this would be
to make the first callback take the input shapes+dtypes and return the
desired output shapes+dtypes?

This hits on an interesting alternative to frozen dimensions - np.cross
could just become a regular ufunc with signature np.dtype((float64, 3)),
np.dtype((float64, 3)) → np.dtype((float64, 3))

Furthermore, the expansion quickly becomes cumbersome. For instance, for
the all_equal signature of (n|1),(n|1)->() …

I think this is only a good argument when used in conjunction with the
broadcasting syntax. I don’t think it’s a reason for matmul not to have
multiple signatures. Having multiple signatures is an disincentive to
introduced too many overloads of the same function, which seems like a good
thing to me

Summarizing my overall opinions:

   - I’m +0.5 on frozen dimensions. The use-cases seem reasonable, and it
   seems like an easy-ish way to get them. Allowing ufuncs to natively support
   subarray types might be a tidier solution, but that could come down the road
   - I’m -1 on optional dimensions: they seem to legitimize creating many
   overloads of gufuncs. I’m already not a fan of how matmul has special cases
   for lower dimensions that don’t generalize well. To me, the best way to
   handle matmul would be to use the proposed __array_function__ to handle
   the shape-based special-case dispatching, either by:
  - Inserting dimensions, and calling the true gufunc
  np.linalg.matmul_2d (which is a function I’d like direct access to
  anyway).
  - Dispatching to one of four ufuncs
   - Broadcasting dimensions:
  - I know you’re not suggesting this but: enabling broadcasting
  unconditionally for all gufuncs would be a bad idea, masking linalg bugs.
  (although einsum does support broadcasting…)
  - Does it really need a per-dimension flag, rather than a global one?
  Can you give a case where that’s useful?
  - If we’d already made all_equal a gufunc, I’d be +1 on adding
  broadcasting support to it
  - I’m -0.5 on the all_equal path in the first place. I think we
  either should have a more generic approach to combined ufuncs, or just
  declare them numbas job.
  - Can you come up with a broadcasting use-case that isn’t just
  chaining a reduction with a broadcasting ufunc?

Eric

On Sun, 10 Jun 2018 at 16:02 Eric Wieser 
wrote:

Rendered here:
> https://github.com/mhvk/numpy/blob/nep-gufunc-signature-enhancement/doc/neps/nep-0020-gufunc-signature-enhancement.rst
>
>
> Eric
>
> On Sun, 10 Jun 2018 at 09:37 Marten van Kerkwijk <
> m.h.vankerkw...@gmail.com> wrote:
>
>> OK, I spent my Sunday morning writing a NEP. I hope this can lead to some
>> closure...
>> See https://github.com/numpy/numpy/pull/11297
>> -- Marten
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> ​
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs

2018-06-10 Thread Eric Wieser
Rendered here:
https://github.com/mhvk/numpy/blob/nep-gufunc-signature-enhancement/doc/neps/nep-0020-gufunc-signature-enhancement.rst


Eric

On Sun, 10 Jun 2018 at 09:37 Marten van Kerkwijk 
wrote:

> OK, I spent my Sunday morning writing a NEP. I hope this can lead to some
> closure...
> See https://github.com/numpy/numpy/pull/11297
> -- Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Random Number Generator Policy

2018-06-03 Thread Eric Wieser
You make a bunch of good points refuting reproducible research as an
argument for not changing the random number streams.

However, there’s a second use-case you don’t address - unit tests. For
better or worse, downstream, or even our own
,
unit tests use a seeded random number generator as a shorthand to produce
some arbirary array, and then hard-code the expected output in their tests.
Breaking stream compatibility will break these tests.

I don’t think writing tests in this way is particularly good idea, but
unfortunately they do still exist.

It would be good to address this use case in the NEP, even if the
conclusion is just “changing the stream will break tests of this form”

Eric

On Sat, 2 Jun 2018 at 12:05 Robert Kern robert.k...@gmail.com
 wrote:

As promised distressingly many months ago, I have written up a NEP about
> relaxing the stream-compatibility policy that we currently have.
>
> https://github.com/numpy/numpy/pull/11229
>
> https://github.com/rkern/numpy/blob/nep/rng/doc/neps/nep-0019-rng-policy.rst
>
> I particularly invite comment on the two lists of methods that we still
> would make strict compatibility guarantees for.
>
> ---
>
> ==
> Random Number Generator Policy
> ==
>
> :Author: Robert Kern 
> :Status: Draft
> :Type: Standards Track
> :Created: 2018-05-24
>
>
> Abstract
> 
>
> For the past decade, NumPy has had a strict backwards compatibility policy
> for
> the number stream of all of its random number distributions.  Unlike other
> numerical components in ``numpy``, which are usually allowed to return
> different when results when they are modified if they remain correct, we
> have
> obligated the random number distributions to always produce the exact same
> numbers in every version.  The objective of our stream-compatibility
> guarantee
> was to provide exact reproducibility for simulations across numpy versions
> in
> order to promote reproducible research.  However, this policy has made it
> very
> difficult to enhance any of the distributions with faster or more accurate
> algorithms.  After a decade of experience and improvements in the
> surrounding
> ecosystem of scientific software, we believe that there are now better
> ways to
> achieve these objectives.  We propose relaxing our strict
> stream-compatibility
> policy to remove the obstacles that are in the way of accepting
> contributions
> to our random number generation capabilities.
>
>
> The Status Quo
> --
>
> Our current policy, in full:
>
> A fixed seed and a fixed series of calls to ``RandomState`` methods
> using the
> same parameters will always produce the same results up to roundoff
> error
> except when the values were incorrect.  Incorrect values will be fixed
> and
> the NumPy version in which the fix was made will be noted in the
> relevant
> docstring.  Extension of existing parameter ranges and the addition of
> new
> parameters is allowed as long the previous behavior remains unchanged.
>
> This policy was first instated in Nov 2008 (in essence; the full set of
> weasel
> words grew over time) in response to a user wanting to be sure that the
> simulations that formed the basis of their scientific publication could be
> reproduced years later, exactly, with whatever version of ``numpy`` that
> was
> current at the time.  We were keen to support reproducible research, and
> it was
> still early in the life of ``numpy.random``.  We had not seen much cause to
> change the distribution methods all that much.
>
> We also had not thought very thoroughly about the limits of what we really
> could promise (and by “we” in this section, we really mean Robert Kern,
> let’s
> be honest).  Despite all of the weasel words, our policy overpromises
> compatibility.  The same version of ``numpy`` built on different
> platforms, or
> just in a different way could cause changes in the stream, with varying
> degrees
> of rarity.  The biggest is that the ``.multivariate_normal()`` method
> relies on
> ``numpy.linalg`` functions.  Even on the same platform, if one links
> ``numpy``
> with a different LAPACK, ``.multivariate_normal()`` may well return
> completely
> different results.  More rarely, building on a different OS or CPU can
> cause
> differences in the stream.  We use C ``long`` integers internally for
> integer
> distribution (it seemed like a good idea at the time), and those can vary
> in
> size depending on the platform.  Distribution methods can overflow their
> internal C ``longs`` at different breakpoints depending on the platform and
> cause all of the random variate draws that follow to be different.
>
> And even if all of that is controlled, our policy still does not provide
> exact
> guarantees across versions.  We still do apply bug fixes when correctness
> is at
> stak

[Numpy-discussion] Adding take_along_axis and put_along_axis functions

2018-05-28 Thread Eric Wieser
These functions provide a vectorized way of using one array to look up
items in another. In particular, they extend the 1d:

a = np.array([4, 5, 6, 1, 2, 3])
b = np.array(["four", "five", "six", "one", "two", "three"])
i = a.argsort()
b_sorted = b[i]

To work for higher-dimensions:

a = np.array([[4, 1], [5, 2], [6, 3]])
b = np.array([["four", "one"],  ["five", "two"], ["six", "three"]])
i = a.argsort(axis=1)
b_sorted = np.take_along_axis(b, i, axis=1)

put_along_axis is the obvious but less useful dual to this operation,
inserting elements rather than extracting them. (Unlike put and take which
are not obvious duals).

These have been merged in gh-11105
, but as a new addition this
probably should have gone by the mailing list first.

There was a lack of consensus in gh-8714
 about how best to generalize to
differing dimensions, so only the non-controversial case where the indices
and array have the same dimensions was implemented.

These names were chosen to mirror apply_along_axis, which behaves
similarly. Do they seem reasonable?

Eric
​
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matmul as a ufunc

2018-05-28 Thread Eric Wieser
which ensure that it is still well defined (as the identity) on 1d arrays.

This strikes me as a bad idea. There’s already enough confusion from
beginners that array_1d.T is a no-op. If we introduce a matrix-transpose,
it should either error on <1d inputs with a useful message, or insert the
extra dimension. I’d favor the former.

Eric

On Mon, 28 May 2018 at 16:27 Stephan Hoyer sho...@gmail.com
 wrote:

On Mon, May 21, 2018 at 5:42 PM Matti Picus  wrote:
>
>> - create a wrapper that can convince the ufunc mechanism to call
>> __array_ufunc__ even on functions that are not true ufuncs
>>
>
> I am somewhat opposed to this approach, because __array_ufunc__ is about
> overloading ufuncs, and as soon as we relax this guarantee the set of
> invariants __array_ufunc__ implementors rely on becomes much more limited.
>
> We really should have another mechanism for arbitrary function overloading
> in NumPy (NEP to follow shortly!).
>
>
>> - expand the semantics of core signatures so that a single matmul ufunc
>> can implement matrix-matrix, vector-matrix, matrix-vector, and
>> vector-vector multiplication.
>
>
> I was initially concerned that adding optional dimensions for gufuncs
> would introduce additional complexity for only the benefit of a single
> function (matmul), but I'm now convinced that it makes sense:
> 1. All other arithmetic overloads use __array_ufunc__, and it would be
> nice to keep @/matmul in the same place.
> 2. There's a common family of gufuncs for which optional dimensions like
> np.matmul make sense: matrix functions where 1D arrays should be treated as
> 2D row- or column-vectors.
>
> One example of this class of behavior would be np.linalg.solve, which
> could support vectors like Ax=b and matrices like Ax=B with the signature
> (m,m),(m,n?)->(m,n?). We couldn't immediately make np.linalg.solve a gufunc
> since it uses a subtly different dispatching rule, but it's the same
> use-case.
>
> Another example would be the "matrix transpose" function that has been
> occasionally proposed, to swap the last two dimensions of an array. It
> could have the signature (m?,n)->(n,m?), which ensure that it is still well
> defined (as the identity) on 1d arrays.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
​
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Matrix opreation

2018-05-22 Thread Eric Wieser
I’d recommend asking this kind of question on stackoverflow in future, but
you can do that with:

b = (a
.reshape((2, 2, 4, 4))   # split up the (4,) axis into (2, 2)
.transpose((2, 0, 3, 1)) # reorder to (4, 2, 4, 2)
.reshape((8, 8)) # collapse adjacent dimensions
)

​

On Tue, 22 May 2018 at 21:31 Yu Peng  wrote:

> Hi, I want to make an opreation like this:
>
> if I hava a matrix:
>
> a=
>
> array([[[ 0,  1,  2,  3],
> [ 4,  5,  6,  7],
> [ 8,  9, 10, 11],
> [12, 13, 14, 15]],
>
>[[16, 17, 18, 19],
> [20, 21, 22, 23],
> [24, 25, 26, 27],
> [28, 29, 30, 31]],
>
>[[32, 33, 34, 35],
> [36, 37, 38, 39],
> [40, 41, 42, 43],
> [44, 45, 46, 47]],
>
>[[48, 49, 50, 51],
> [52, 53, 54, 55],
> [56, 57, 58, 59],
> [60, 61, 62, 63]]])
>
>
> and  the shape of a is (4,4,4), I want to tranform this tensor or matrix
> to (8,8), and the final result is like this:
> 0 16 1 17 2 18 3 19
> 32 48 33 49 34 50 35 51
> 4 20 5 21 6 22 7 23
> 36 52 37 53 38 54 39 55
> 8 24 9 25 10 26 11 27
> 40 56 41 57 42 58 43 59
> 12 28 13 29 14 30 15 31
> 44 60 45 61 46 62 47 63
> If you know how to deal with this matrix, please give me some
> suggestions.. Thanks.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions

2018-04-30 Thread Eric Wieser
I think I’m -1 on this - this just makes things harder on the implementers
of _array_ufunc__ who now might have to work out which signature matches.
I’d prefer the solution where np.matmul is a wrapper around one of three
gufuncs (or maybe just around one with axis insertion) - this is similar to
how np.linalg already works.

Eric
​

On Mon, 30 Apr 2018 at 14:34 Stephan Hoyer  wrote:

> On Sun, Apr 29, 2018 at 2:48 AM Matti Picus  wrote:
>
>> The  proposed solution to issue #9029 is to extend the meaning of a
>> signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m
>> are optional dimensions; if missing in the input, they're treated as 1, and
>> then dropped from the output"
>
>
> I agree that this is an elegant fix for matmul, but are there other
> use-cases for "optional dimensions" in gufuncs?
>
> It feels a little wrong to add gufunc features if we can only think of one
> function that can use them.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.pad -- problem?

2018-04-29 Thread Eric Wieser
I would consider this a bug, and think we should fix this.

On Sun, 29 Apr 2018 at 13:48 Virgil Stokes  wrote:

> Here is a python code snippet:
>
> # python vers. 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC
> v.1900 64 bit (AMD64)]
> import numpy as np  # numpy vers. 1.14.3
> #import matplotlib.pyplot as plt
>
> N   = 21
> amp = 10
> t   = np.linspace(0.0,N-1,N)
> arg = 2.0*np.pi/(N-1)
>
> y = amp*np.sin(arg*t)
> print('y:\n',y)
> print('mean(y): ',np.mean(y))
>
> #plt.plot(t,y)
> #plt.show()
>
> ypad = np.pad(y, (3,2),'mean')
> print('ypad:\n',ypad)
>
> When I execute this the outputs are:
>
> y:
>  [ 0.e+00  3.09016994e+00  5.87785252e+00  8.09016994e+00
>   9.51056516e+00  1.e+01  9.51056516e+00  8.09016994e+00
>   5.87785252e+00  3.09016994e+00  1.22464680e-15 -3.09016994e+00
>  -5.87785252e+00 -8.09016994e+00 -9.51056516e+00 -1.e+01
>  -9.51056516e+00 -8.09016994e+00 -5.87785252e+00 -3.09016994e+00
>  -2.44929360e-15]
> mean(y):  -1.3778013372117948e-16
> ypad:
>  [-1.37780134e-16 -1.37780134e-16 -1.37780134e-16  0.e+00
>   3.09016994e+00  5.87785252e+00  8.09016994e+00  9.51056516e+00
>   1.e+01  9.51056516e+00  8.09016994e+00  5.87785252e+00
>   3.09016994e+00  1.22464680e-15 -3.09016994e+00 -5.87785252e+00
>  -8.09016994e+00 -9.51056516e+00 -1.e+01 -9.51056516e+00
>  -8.09016994e+00 -5.87785252e+00 -3.09016994e+00 -2.44929360e-15
>  -7.40148683e-17 -7.40148683e-17]
>
> The left pad is correct, but the right pad is different and not the mean
> of y)  --- why?
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing the return type of np.histogramdd

2018-04-27 Thread Eric Wieser
It’s late and I’m probably missing something

The issue is not one of range as you showed there, but of precision. Here’s
the test case you’re missing:

def get_err(u64):
""" return the absolute error incurred by storing a uint64 in a float64 ""
u64 = np.uint64(u64)
return u64 - u64.astype(np.float64).astype(np.uint64)

The problem starts appearing with

>>> get_err(2**53 + 1)1

and only gets worse as the size of the integers increases

>>> get_err(2**64 - 2*10)9223372036854775788  # this is a lot bigger than 
>>> float64.eps (although as a relative error, it's similar)

Either way, such weights don’t really happen in real code I think.

The counterexample I can think of is someone trying to implement
fixed-precision arithmetic with large integers. The intersection of people
doing both that and histogramdd is probably very small, but it’s at least
plausible.

Yes, there are cross-links to Python, SciPy and Matplotlib functions in the
docs.

Great, that was what I was unsure of. I was worried that linking to
upstream projects would be sort of weird, but practicality beats purity for
sure here.

Eric
​

On Fri, 27 Apr 2018 at 22:26 Ralf Gommers  wrote:

> On Wed, Apr 25, 2018 at 11:00 PM, Eric Wieser  > wrote:
>
>> For precision loss of the order of float64 eps, I disagree.
>>
>> I was thinking more about precision loss on the order of 1, for large
>> 64-bit integers that can’t fit in a float64
>>
> It's late and I'm probably missing something, but:
>
> >>> np.iinfo(np.int64).max > np.finfo(np.float64).max
> False
>
> Either way, such weights don't really happen in real code I think.
>
>
>> Note also that #10864 <https://github.com/numpy/numpy/issues/10864>
>> incurs deliberate precision loss of the order 10**-6 x smallest bin, which
>> is also much larger than eps.
>>
> Yeah that's worse.
>
>
>> It’s also possible to refer users to scipy.stats.binned_statistic
>>
>> That sounds like a good idea to do irrespective of whether histogramdd
>> has problems - I had no idea those existed. Is there a precedent for
>> referring to more feature-rich scipy functions from the basic numpy ones?
>>
> Yes, there are cross-links to Python, SciPy and Matplotlib functions in
> the docs. This is done with intersphinx (
> https://github.com/numpy/numpy/blob/master/doc/source/conf.py#L215).
> Example cross-link for convolve:
> https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.convolve.html
>
> Ralf
>
>
>
>> ​
>>
>> On Wed, 25 Apr 2018 at 22:51 Ralf Gommers  wrote:
>>
>>> On Wed, Apr 25, 2018 at 10:07 PM, Eric Wieser <
>>> wieser.eric+nu...@gmail.com> wrote:
>>>
>>>> what does that gain over having the user do something like
>>>> result.astype()
>>>>
>>>> It means that the user can use integer weights without worrying about
>>>> losing precision due to an intermediate float representation.
>>>>
>>>> It also means they can use higher precision values (np.longdouble) or
>>>> complex weights.
>>>>
>>> None of that seems particularly important to be honest.
>>>
>>> you’re emitting warnings for everyone
>>>>
>>>> When there’s a risk of precision loss, that seems like the responsible
>>>> thing to do.
>>>>
>>> For precision loss of the order of float64 eps, I disagree. There will
>>> be many such places in numpy and in other core libraries.
>>>
>>>
>>>> Users passing float weights would see no warning, I suppose.
>>>>
>>>> is this really worth a new function
>>>>
>>>> There ought to be a function for computing histograms with integer
>>>> weights that doesn’t lose precision. Either we change the existing function
>>>> to do that, or we make a new function.
>>>>
>>> It's also possible to refer users to
>>> scipy.stats.binned_statistic(_2d/dd), which provides a superset of the
>>> histogram functionality and is internally consistent because the
>>> implementations of 1d/2d call the dd one.
>>>
>>> Ralf
>>>
>>>
>>>
>>>> A possible compromise: like 1, but only change the dtype of the result
>>>> if a weights argument is passed.
>>>>
>>>> #10864 <https://github.com/numpy/numpy/issues/10864> seems like a
>>>> worrying design flaw too, but I suppose that can be dealt with separately.
>>>>
>>>> Eric
>>>> ​
>

Re: [Numpy-discussion] Changing the return type of np.histogramdd

2018-04-25 Thread Eric Wieser
For precision loss of the order of float64 eps, I disagree.

I was thinking more about precision loss on the order of 1, for large
64-bit integers that can’t fit in a float64

Note also that #10864 <https://github.com/numpy/numpy/issues/10864> incurs
deliberate precision loss of the order 10**-6 x smallest bin, which is also
much larger than eps.

It’s also possible to refer users to scipy.stats.binned_statistic

That sounds like a good idea to do irrespective of whether histogramdd has
problems - I had no idea those existed. Is there a precedent for referring
to more feature-rich scipy functions from the basic numpy ones?
​

On Wed, 25 Apr 2018 at 22:51 Ralf Gommers  wrote:

> On Wed, Apr 25, 2018 at 10:07 PM, Eric Wieser  > wrote:
>
>> what does that gain over having the user do something like result.astype()
>>
>> It means that the user can use integer weights without worrying about
>> losing precision due to an intermediate float representation.
>>
>> It also means they can use higher precision values (np.longdouble) or
>> complex weights.
>>
> None of that seems particularly important to be honest.
>
> you’re emitting warnings for everyone
>>
>> When there’s a risk of precision loss, that seems like the responsible
>> thing to do.
>>
> For precision loss of the order of float64 eps, I disagree. There will be
> many such places in numpy and in other core libraries.
>
>
>> Users passing float weights would see no warning, I suppose.
>>
>> is this really worth a new function
>>
>> There ought to be a function for computing histograms with integer
>> weights that doesn’t lose precision. Either we change the existing function
>> to do that, or we make a new function.
>>
> It's also possible to refer users to scipy.stats.binned_statistic(_2d/dd),
> which provides a superset of the histogram functionality and is internally
> consistent because the implementations of 1d/2d call the dd one.
>
> Ralf
>
>
>
>> A possible compromise: like 1, but only change the dtype of the result if
>> a weights argument is passed.
>>
>> #10864 <https://github.com/numpy/numpy/issues/10864> seems like a
>> worrying design flaw too, but I suppose that can be dealt with separately.
>>
>> Eric
>> ​
>>
>> On Wed, 25 Apr 2018 at 21:57 Ralf Gommers  wrote:
>>
>>> On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser <
>>> wieser.eric+nu...@gmail.com> wrote:
>>>
>>>> Numpy has three histogram functions - histogram, histogram2d, and
>>>> histogramdd.
>>>>
>>>> histogram is by far the most widely used, and in the absence of
>>>> weights and normalization, returns an np.intp count for each bin.
>>>>
>>>> histogramdd (for which histogram2d is a wrapper) returns np.float64 in
>>>> all circumstances.
>>>>
>>>> As a contrived comparison
>>>>
>>>> >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h
>>>> array([25, 10,  8,  7], dtype=int64)>>> h, e = np.histogramdd((x*x,), 
>>>> bins=4); h
>>>> array([25., 10.,  8.,  7.])
>>>>
>>>> https://github.com/numpy/numpy/issues/7845 tracks this inconsistency.
>>>>
>>>> The fix is now trivial: the question is, will changing the return type
>>>> break people’s code?
>>>>
>>>> Either we should:
>>>>
>>>>1. Just change it, and hope no one is broken by it
>>>>2. Add a dtype argument:
>>>>   - If dtype=None, behave like np.histogram
>>>>   - If dtype is not specified, emit a future warning recommending
>>>>   to use dtype=None or dtype=float
>>>>   - In future, change the default to None
>>>>3. Create a new better-named function histogram_nd, which can also
>>>>be created without the mistake that is
>>>>https://github.com/numpy/numpy/issues/10864.
>>>>
>>>> Thoughts?
>>>>
>>>
>>> (1)  sems like a no-go, taking such risks isn't justified by a minor
>>> inconsistency.
>>>
>>> (2) is still fairly intrusive, you're emitting warnings for everyone and
>>> still force people to change their code (and if they don't they may run
>>> into a backwards compat break).
>>>
>>> (3) is the best of these options, however is this really worth a new
>>> function? My vote would be "do nothing".
>>>
>>> Ralf
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing the return type of np.histogramdd

2018-04-25 Thread Eric Wieser
what does that gain over having the user do something like result.astype()

It means that the user can use integer weights without worrying about
losing precision due to an intermediate float representation.

It also means they can use higher precision values (np.longdouble) or
complex weights.

you’re emitting warnings for everyone

When there’s a risk of precision loss, that seems like the responsible
thing to do. Users passing float weights would see no warning, I suppose.

is this really worth a new function

There ought to be a function for computing histograms with integer weights
that doesn’t lose precision. Either we change the existing function to do
that, or we make a new function.

A possible compromise: like 1, but only change the dtype of the result if a
weights argument is passed.

#10864 <https://github.com/numpy/numpy/issues/10864> seems like a worrying
design flaw too, but I suppose that can be dealt with separately.

Eric
​

On Wed, 25 Apr 2018 at 21:57 Ralf Gommers  wrote:

> On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser 
> wrote:
>
>> Numpy has three histogram functions - histogram, histogram2d, and
>> histogramdd.
>>
>> histogram is by far the most widely used, and in the absence of weights
>> and normalization, returns an np.intp count for each bin.
>>
>> histogramdd (for which histogram2d is a wrapper) returns np.float64 in
>> all circumstances.
>>
>> As a contrived comparison
>>
>> >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h
>> array([25, 10,  8,  7], dtype=int64)>>> h, e = np.histogramdd((x*x,), 
>> bins=4); h
>> array([25., 10.,  8.,  7.])
>>
>> https://github.com/numpy/numpy/issues/7845 tracks this inconsistency.
>>
>> The fix is now trivial: the question is, will changing the return type
>> break people’s code?
>>
>> Either we should:
>>
>>1. Just change it, and hope no one is broken by it
>>2. Add a dtype argument:
>>   - If dtype=None, behave like np.histogram
>>   - If dtype is not specified, emit a future warning recommending to
>>   use dtype=None or dtype=float
>>   - In future, change the default to None
>>3. Create a new better-named function histogram_nd, which can also be
>>created without the mistake that is
>>https://github.com/numpy/numpy/issues/10864.
>>
>> Thoughts?
>>
>
> (1)  sems like a no-go, taking such risks isn't justified by a minor
> inconsistency.
>
> (2) is still fairly intrusive, you're emitting warnings for everyone and
> still force people to change their code (and if they don't they may run
> into a backwards compat break).
>
> (3) is the best of these options, however is this really worth a new
> function? My vote would be "do nothing".
>
> Ralf
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Adding a return value to np.random.shuffle

2018-04-12 Thread Eric Wieser
I’m against this change, because it:

   - Is inconsistent with the builtin random.shuffle
   - Makes it easy to fall into the trap of assuming that np.random.shuffle
   does not mutate it’s input

Eric
​

On Thu, 12 Apr 2018 at 10:37 Joseph Fox-Rabinovitz 
wrote:

> Would it break backwards compatibility to add the input as a return value
> to np.random.shuffle? I doubt anyone out there is relying on the None
> return value.
>
> The change is trivial, and allows shuffling a new array in one line
> instead of two:
>
> x = np.random.shuffle(np.array(some_junk))
>
> I've implemented the change in PR#10893.
>
> Regards,
>
> - Joe
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


  1   2   >