Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Daπid
On 30 September 2015 at 18:20, Nathaniel Smith  wrote:

> On Sep 30, 2015 2:28 AM, "Daπid"  wrote:
> [...]
> > Is there a nice way to ship both versions? After all, most
> implementations of BLAS and friends do spawn OpenMP threads, so I don't
> think it would be outrageous to take advantage of it in more places;
> provided there is a nice way to fallback to a serial version when it is not
> available.
>
> This is incorrect -- the only common implementation of BLAS that uses
> *OpenMP* threads is OpenBLAS, and even then it's not the default -- it only
> happens if you run it in a special non-default configuration.
>
Right, sorry. I wanted to say they spawn parallel threads. What do you mean
by a non default configuration? Setting he OMP_NUM_THREADS?

> The challenges to providing transparent multithreading in numpy generally
> are:
>
> - gcc + OpenMP on linux still breaks multiprocessing. There's a patch to
> fix this but they still haven't applied it; alternatively there's a
> workaround you can use in multiprocessing (not using fork mode), but this
> requires every user update their code and the workaround has other
> limitations. We're unlikely to use OpenMP while this is the case.
>
Any idea when is this going to be released?

As I understand it, OpenBLAS doesn't have this problem, am I right?

> - parallel code in general is not very composable. If someone is calling a
> numpy operation from one thread, great, transparently using multiple
> threads internally is a win. If they're exploiting some higher-level
> structure in their problem to break it into pieces and process each in
> parallel, and then using numpy on each piece, then numpy spawning threads
> internally will probably destroy performance. And numpy is too low-level to
> know which case it's in. This problem exists to some extent already with
> multi-threaded BLAS, so people use various BLAS-specific knobs to manage it
> in ad hoc ways, but this doesn't scale.
>
> (Ironically OpenMP is more composable then most approaches to threading,
> but only if everyone is using it and, as per above, not everyone is and we
> currently can't.)
>
That is what I meant with providing also a single threaded version.
The user can
choose if they want the parallel or the serial, depending on the case.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Sturla Molden

On 01/10/15 02:32, Juha Jeronen wrote:



Sounds good. Out of curiosity, are there any standard fork-safe
threadpools, or would this imply rolling our own?


You have to roll your own.

Basically use pthreads_atfork to register a callback that shuts down the 
threadpool before a fork and another callback that restarts it. Python's 
threading module does not expose the pthreads_atfork function, so you 
must call it from Cython.


I believe there is also a tiny atfork module in PyPI.


So maybe it would be better, at least at first, to make a pure-Cython
version with no attempt at multithreading?


I would start by making a pure Cython version that works correctly. The 
next step would be to ensure that it releases the GIL. After that you 
can worry about parallel processing, or just tell the user to use 
threads or joblib.



Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Sturla Molden

On 01/10/15 02:20, Juha Jeronen wrote:


Then again, the matter is further complicated by considering codes that
run on a single machine, versus codes that run on a cluster.Threads
being local to each node in a cluster,


You can run MPI programs on a single machine and you get OpenMP 
implementations for clusters. Just pick an API and stick with it.


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

On 01.10.2015 01:04, Sturla Molden wrote:

There are two principal problems with using OpenMP in NumPy:

One is that the GNU OpenMP threadpool is not fork-safe, and can break 
multiprocessing on some platforms (e.g. when using Python 2.7 on 
Linux). Anything that uses GCD has this nasty effect on Apple and 
FreeBSD as well. Note that the problem is actually in multiprocessing. 
It is not present on Windows (there is no fork system call) and it is 
avoidable even on Linux with Python 3.4 or later. Also the default 
builds of NumPy and SciPy on MacOSX uses GCD by default.


The other problem is that NumPy with its iterator objects and gufuncs 
is not particularly fit for multithreading. There will be a lot of 
contention for the iterator object. Without a major redesign, one 
thread would do useful work while most of the others would be 
busy-waiting on a spinlock and fighting for iterator access. Not to 
mention that the iterator object would be false shared between the 
processors, which would trash the performance if you have more than 
one CPU, even when compared to using a single thread. This means that 
for multithreading to be useful in NumPy, the loops will have to be 
redesigned so that the work sharing between the threads is taken care 
of ahead of creating the iterator, i.e. allowing us to use one 
iterator per thread. Each iterator would then iterate over a slice of 
the original array. This is ok, but we cannot simply do this by adding 
OpenMP pragmas in the C code.


Thanks for the details.


Given the two problems mentioned above, it would likely be better to 
use a fork-safe threadpool instead of OpenMP.


Sounds good. Out of curiosity, are there any standard fork-safe 
threadpools, or would this imply rolling our own?



Also, after Chris's, Nathaniel's and your replies, I'm no longer sure an 
attempt at transparent multithreading belongs in the polynomial solver. 
Although the additional performance would be (very) nice when running it 
from a single-threaded program - after all, quad-cores are common these 
days - it is not just this specific solver that encounters the issue, 
but instead the same considerations apply to basically all NumPy operations.


So maybe it would be better, at least at first, to make a pure-Cython 
version with no attempt at multithreading?



 -J

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

On 30.09.2015 19:20, Nathaniel Smith wrote:
The challenges to providing transparent multithreading in numpy 
generally are:


- gcc + OpenMP on linux still breaks multiprocessing. There's a patch 
to fix this but they still haven't applied it; alternatively there's a 
workaround you can use in multiprocessing (not using fork mode), but 
this requires every user update their code and the workaround has 
other limitations. We're unlikely to use OpenMP while this is the case.




Ah, I didn't know this. Thanks.


- parallel code in general is not very composable. If someone is 
calling a numpy operation from one thread, great, transparently using 
multiple threads internally is a win. If they're exploiting some 
higher-level structure in their problem to break it into pieces and 
process each in parallel, and then using numpy on each piece, then 
numpy spawning threads internally will probably destroy performance. 
And numpy is too low-level to know which case it's in. This problem 
exists to some extent already with multi-threaded BLAS, so people use 
various BLAS-specific knobs to manage it in ad hoc ways, but this 
doesn't scale.


Very good point. I've had both kinds of use cases myself.

It would be nice if there was some way to tell NumPy to either use 
additional threads or not, but that adds complexity. It's also not a 
good solution, considering that any higher-level code building on NumPy, 
if it is designed to be at all reusable, may find *itself* in either 
role. Only the code that, at any particular point of time in the 
development of a software project, happens to form the top level at that 
time, has the required context...


Then again, the matter is further complicated by considering codes that 
run on a single machine, versus codes that run on a cluster. Threads 
being local to each node in a cluster, it may make sense in a solver 
targeted for a cluster to split the problem at the process level, 
distribute the processes across the network, and use the threading 
capability to accelerate computation on each node.


A complex issue with probably no easy solutions :)


 -J

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

On 30.09.2015 13:21, Pauli Virtanen wrote:

Juha Jeronen  jyu.fi> writes:

I recently developed a Cython-based, OpenMP-accelerated quartic (and
cubic, quadratic) polynomial solver to address a personal research need
for quickly solving a very large number of independent low-degree
polynomial equations for both real and complex coefficients.

My 2c in this context would be to also think how this best fits
in how collections of polynomials are represented in Numpy.

AFAICS, Numpy's polynomial module supports evaluation of polynomial
collections (cf. numpy.polynomial.polynomial.polyval),
but the corresponding root finding routine
(numpy.polynomial.polynomial.polyroots) only deals with one
polynomial at a time.

The present code in principle could be used to speed up the latter
routine after it is generalized to multiple polynomials. The general
case is probably doable just by coding up the companion matrix
approach using low-level routines (or maybe with the new vectorized
np.linalg routines).

Some relevant code elsewhere for similar purposes can be
found in scipy.inteprolate.PPoly/BPoly (and in the future BSpline).

https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.PPoly.html

However, since it's piecewise, there's purposefully support only
for real-valued roots.


Thanks.

It sounds like numpy.polynomial.polynomial.polyroots() is a good 
candidate place for this, if as you suggest, it is first generalized to 
handle multiple polynomials.


 -J

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

On 30.09.2015 19:20, Chris Barker wrote:
On Wed, Sep 30, 2015 at 6:57 AM, Nathan Goldbaum 
mailto:nathan12...@gmail.com>> wrote:


Note however that with the current version of the code, not
having OpenMP will severely limit the performance, especially
on quad-core machines, since an important factor in the speed
comes from the parallel processing of the independent problem
instances.


It seems you got much more than 4 times performance improvement


True, good point. That's the power of a short explicit algorithm for a 
specific problem :)


(And maybe some Cython, too.)


-- which is the most you'll get from four cores, presumably :-). not 
that another 4 times or so isn't a good thing.


True in general :)

To be precise, in this case, HyperThreading seems to offer some extra 
performance.


And there seems to be something wonky with the power management on my 
laptop. The number I posted (1.2e7 quartics per second) seemed slightly 
low compared to earlier informal benchmarking I had done, and indeed, 
benchmarking with 8 threads again I'm now getting 1.6e7 quartics per 
second (i7-4710MQ at 2.5 GHz, RAM at 1333 MHz, running on Linux Mint 17.2).


Here are some more complete benchmark results. Because the polynomials 
to be solved are randomly generated during each run of the program, 
unless otherwise noted, I've rerun the benchmark three times for each 
number of threads and picked the lowest-performance result, rounded down.


Number of threads vs. quartics solved per second:

1: 3.0e6
2: 6.1e6
4: 1.1e7
8: 1.6e7  (note: higher than previously!)

Threads vs. cubics:

1: 8.4e6
2: 1.6e7
4: 3.0e7
8: 4.6e7

Threads vs. quadratics:

1: 2.5e7
2: 4.8e7
4: 8.5e7 (typical across 9 benchmarks; lowest 5.7e7)
8: 9.5e7

From these, in general the weak scaling seems pretty good, and for 
quadratics and cubics, HyperThreading offers some extra performance, 
namely about 50% over the 4-thread result in both cases. The total 
speedup for quartics is 5.3x (3.6x without HT), and for cubics, 5.4x 
(3.5x without HT).


"About 4x, as expected" is still a good characterization, though :)


For quadratics, 4 threads seems to be the useful maximum; improvement 
with HT is only 11%, with a total speedup of 3.8x (vs. 3.4x without HT). 
These require much fewer arithmetic operations than the higher-order 
cases, so a possible explanation is that this case is memory-bound. 
Running the numbers on a simplistic estimate, the total of input and 
output is far from hitting the DDR3 bandwidth limit, but of course 
that's ignoring (at least) latencies and the pattern of data accesses.


Testing the completely silly "linear" case that is there for documenting 
code structure only, I get:


1: 1.9e8
2: 3.1e8
4: 3.6e8
8: 2.9e8

which should give a reasonable first approximation for the maximum 
practical throughput on this machine, when performing almost no arithmetic.


Anyway, returning from the sidetrack...



But I suppose there may be another technical option to support
multiple cores


python threads with nogil?


...maybe?

I hadn't thought about this option. I suppose that in pseudocode, the 
code structure would need to be something like this?


---8<---8<---8<---

from threading import Thread

cdef fast_solver(...) nogil:
# ...do stuff without touching Python objects...

def my_thread_entry(j_start, j_end):
cdef unsigned int j
with nogil:
for j in range(j_start, j_end):
fast_solver(...)  # process local slice

t1 = Thread( target=my_thread_entry, args=(0, j_split) )
t2 = Thread( target=my_thread_entry, args=(j_split, j_end_global) )
t1.start()
t2.start()
t1.join()
t2.join()

---8<---8<---8<---


 -J

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Sturla Molden

On 30/09/15 18:20, Chris Barker wrote:


python threads with nogil?


This is often the easiest, particularly if we construct a fork-safe 
threadpool.



Sturla


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Sturla Molden

On 30/09/15 15:57, Nathan Goldbaum wrote:


Basically, just try to compile a simple OpenMP test program in a
subprocess. If that succeeds, then great, we can add -fopenmp as a
compilation flag. If not, don't do that.


Unless you use GCC on Linux, it will be more complex than that. I.e. do 
you need -fopenmp, -openmp or /openmp? And which libraries do you need 
to link, if any? Link GOMP? Link some other OpenMP runtime libraries? 
Link pthreads? There is three different pthread implementations to 
choose between on Windows, depending on GCC version (MinGW, MinGW-w64,
and Cygwin need different pthread libs for OpenMP). It gets messy. The 
only clean way is to add the correct flags and libraries into the 
compiler classes in numpy.distutils, and then let distutils and bento 
query those.



STurla






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Sturla Molden

On 30/09/15 18:20, Chris Barker wrote:


We'd need a run-time check.


We need to amend the compiler classes in numpy.distutils with OpenMP 
relevant information (compiler flags and libraries).


The OpenMP support libraries must be statically linked.


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Sturla Molden

On 30/09/15 18:20, Nathaniel Smith wrote:


This is incorrect -- the only common implementation of BLAS that uses
*OpenMP* threads is OpenBLAS,


MKL and ACML also use OpenMP.

Sturla




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Sturla Molden

On 30/09/15 11:27, Daπid wrote:


Is there a nice way to ship both versions? After all, most
implementations of BLAS and friends do spawn OpenMP threads, so I don't
think it would be outrageous to take advantage of it in more places;


Some do, others don't.

ACML uses OpenMP.

GotoBLAS uses OpenMP.

Intel MKL uses Intel TBB and OpenMP (both of them).

OpenBLAS will by default use an internal threadpool. It can be 
configured to use OpenMP instead.


ATLAS uses its own threadpool.

Apple Accelerate Framework uses a kernel thread-pool called the Grand 
Central Dispatch (GCD). A library called libdispatch uses kqueue to 
access the GCD.




There are two principal problems with using OpenMP in NumPy:

One is that the GNU OpenMP threadpool is not fork-safe, and can break 
multiprocessing on some platforms (e.g. when using Python 2.7 on Linux). 
Anything that uses GCD has this nasty effect on Apple and FreeBSD as 
well. Note that the problem is actually in multiprocessing. It is not 
present on Windows (there is no fork system call) and it is avoidable 
even on Linux with Python 3.4 or later. Also the default builds of NumPy 
and SciPy on MacOSX uses GCD by default.


The other problem is that NumPy with its iterator objects and gufuncs is 
not particularly fit for multithreading. There will be a lot of 
contention for the iterator object. Without a major redesign, one thread 
would do useful work while most of the others would be busy-waiting on a 
spinlock and fighting for iterator access. Not to mention that the 
iterator object would be false shared between the processors, which 
would trash the performance if you have more than one CPU, even when 
compared to using a single thread. This means that for multithreading to 
be useful in NumPy, the loops will have to be redesigned so that the 
work sharing between the threads is taken care of ahead of creating the 
iterator, i.e. allowing us to use one iterator per thread. Each iterator 
would then iterate over a slice of the original array. This is ok, but 
we cannot simply do this by adding OpenMP pragmas in the C code.


Given the two problems mentioned above, it would likely be better to use 
a fork-safe threadpool instead of OpenMP.



Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Matthieu Brucher
After, I agree with you.

2015-09-30 18:14 GMT+01:00 Robert Kern :
> On Wed, Sep 30, 2015 at 10:35 AM, Matthieu Brucher
>  wrote:
>>
>> Yes, obviously, the code has NR parts, so it can't be licensed as BSD
>> as it is...
>
> It's not obvious to me, especially after Juha's further clarifications.
>
> --
> Robert Kern
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Robert Kern
On Wed, Sep 30, 2015 at 10:35 AM, Matthieu Brucher <
matthieu.bruc...@gmail.com> wrote:
>
> Yes, obviously, the code has NR parts, so it can't be licensed as BSD
> as it is...

It's not obvious to me, especially after Juha's further clarifications.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sign of NaN

2015-09-30 Thread Sebastian Berg
On Mi, 2015-09-30 at 09:11 -0700, Chris Barker wrote:
> On Tue, Sep 29, 2015 at 6:35 PM, Charles R Harris
>  wrote:
> For this, and other use-cases, special casing
> Numpy arrays stored in object arrays does make
> sense:
> 
> 
> "If this is s a Numpy array, pass the
> operation through."
> 
> 
> Because we now (development) use rich compare, the
> result looks like
> 
> 
> Oops, not what was intended. In fact it raises an error
> 
> In [7]: b
> Out[7]: array([array([ 1.,  1.,  1.]), array([-1., -1.,
> -1.])], dtype=object)
> 
> In [8]: sign(b)
> 
> ---
> ValueErrorTraceback (most
> recent call last)
>  in ()
> > 1 sign(b)
> 
> ValueError: The truth value of an array with more than one
> element is ambiguous. Use a.any() or a.all()
> 
> 
> exactly -- it seems to me that a special case for numpy arrays as
> objects in object arrays makes sense, so you'd get:
> 
> 
> In [6]: oa
> Out[6]: 
> array([[1.0, 1.0, 1.0],
>[-1.0, -1.0, -1.0]], dtype=object)
> 
> 
> In [7]: np.sign(oa)
> Out[7]: 
> array([[1, 1, 1],
>[-1, -1, -1]], dtype=object)
> 
> 
> (which you do now in the version I'm running).
> 
> 
> Though rather than the special case, maybe we really need
> dtype=ndarray arrays?
> 

I think this (as a dtype) is an obvious solution. The other solution, I
am not sure about in general to be honest. We may have to be more
careful about creating a monster with new dtypes, rather than being
careful to implement all possible features ;).
It is not that I think we would not have consistent rules, etc. it is
just that we *want* to force code to be obvious. If someone has arrays
inside arrays, maybe he should be expected to specify that.

It actually breaks some logic (or cannot be implemented for everything),
because we have signatures such as `O->?`, which does not support array
output.

- Sebastian

> 
>  oa = np.array([a1, a2], dtype=np.ndarray)
> 
> 
> 
> Then we could count on everything in the array being an array.
> 
> 
> -CHB
> 
> 
> -- 
> 
> Christopher Barker, Ph.D.
> Oceanographer
> 
> Emergency Response Division
> NOAA/NOS/OR&R(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
> 
> chris.bar...@noaa.gov
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Nathaniel Smith
On Sep 30, 2015 2:28 AM, "Daπid"  wrote:
[...]
> Is there a nice way to ship both versions? After all, most
implementations of BLAS and friends do spawn OpenMP threads, so I don't
think it would be outrageous to take advantage of it in more places;
provided there is a nice way to fallback to a serial version when it is not
available.

This is incorrect -- the only common implementation of BLAS that uses
*OpenMP* threads is OpenBLAS, and even then it's not the default -- it only
happens if you run it in a special non-default configuration.

The challenges to providing transparent multithreading in numpy generally
are:

- gcc + OpenMP on linux still breaks multiprocessing. There's a patch to
fix this but they still haven't applied it; alternatively there's a
workaround you can use in multiprocessing (not using fork mode), but this
requires every user update their code and the workaround has other
limitations. We're unlikely to use OpenMP while this is the case.

- parallel code in general is not very composable. If someone is calling a
numpy operation from one thread, great, transparently using multiple
threads internally is a win. If they're exploiting some higher-level
structure in their problem to break it into pieces and process each in
parallel, and then using numpy on each piece, then numpy spawning threads
internally will probably destroy performance. And numpy is too low-level to
know which case it's in. This problem exists to some extent already with
multi-threaded BLAS, so people use various BLAS-specific knobs to manage it
in ad hoc ways, but this doesn't scale.

(Ironically OpenMP is more composable then most approaches to threading,
but only if everyone is using it and, as per above, not everyone is and we
currently can't.)

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Chris Barker
On Wed, Sep 30, 2015 at 6:57 AM, Nathan Goldbaum 
wrote:

> Note however that with the current version of the code, not having OpenMP
>> will severely limit the performance, especially on quad-core machines,
>> since an important factor in the speed comes from the parallel processing
>> of the independent problem instances.
>>
>
It seems you got much more than 4 times performance improvement -- which is
the most you'll get from four cores, presumably :-). not that another 4
times or so isn't a good thing.

But I suppose there may be another technical option to support multiple
>> cores
>
>
python threads with nogil?


> For what it's worth (no idea what the order of magnitude of the technical
> risk is for something like this in a library like numpy), it's actually
> quite simple to dynamically test for OpenMP support at install time.
>
>

> Basically, just try to compile a simple OpenMP test program in a
> subprocess. If that succeeds, then great, we can add -fopenmp as a
> compilation flag. If not, don't do that.
>

this sounds like an compilation-tiem tiem test, not isntall time. And
install time isn't really helpful either, we really want plan old wheels,
etc to work.

We'd need a run-time check.

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sign of NaN

2015-09-30 Thread Chris Barker
On Wed, Sep 30, 2015 at 12:32 AM, Sebastian Berg  wrote:

> > >> Plus we hope that many use cases for object arrays will soon be
> > >> supplanted by better dtype support, so now may not be the best time to
> > >> invest heavily in making object arrays complicated and powerful.Well,
> what I mean is. A
>


> `Decimal` will probably never know about numpy
> itself. So I was wondering if you should teach numpy the other way
> around about it.
>

indeed -- but the way to do that is to create a Decimal dtype -- if we have
the "better dtype support", then that shouldn't be hard to do.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sign of NaN

2015-09-30 Thread Chris Barker
On Tue, Sep 29, 2015 at 6:35 PM, Charles R Harris  wrote:

> For this, and other use-cases, special casing Numpy arrays stored in
>>> object arrays does make sense:
>>>
>>> "If this is s a Numpy array, pass the operation through."
>>>
>>
>> Because we now (development) use rich compare, the result looks like
>>
>> Oops, not what was intended. In fact it raises an error
>
> In [7]: b
> Out[7]: array([array([ 1.,  1.,  1.]), array([-1., -1., -1.])],
> dtype=object)
>
> In [8]: sign(b)
> ---
> ValueErrorTraceback (most recent call last)
>  in ()
> > 1 sign(b)
>
> ValueError: The truth value of an array with more than one element is
> ambiguous. Use a.any() or a.all()
>

exactly -- it seems to me that a special case for numpy arrays as objects
in object arrays makes sense, so you'd get:

In [6]: oa
Out[6]:
array([[1.0, 1.0, 1.0],
   [-1.0, -1.0, -1.0]], dtype=object)

In [7]: np.sign(oa)
Out[7]:
array([[1, 1, 1],
   [-1, -1, -1]], dtype=object)

(which you do now in the version I'm running).

Though rather than the special case, maybe we really need dtype=ndarray
arrays?

 oa = np.array([a1, a2], dtype=np.ndarray)

Then we could count on everything in the array being an array.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] composition of the steering council (was Re: Governance model request)

2015-09-30 Thread Chris Barker
On Tue, Sep 29, 2015 at 7:32 AM, Travis Oliphant 
wrote:

> I'm in a situation now where at least for 6 months or so I can help with
> NumPy more than I have been able to for 7 years.
>

great news!


> 1) 1 year of inactivity to be removed from the council is too little for a
> long-running project like NumPy
>

I always read this as "activity in council matters", not "activity in
commits to the code". If that's not what is meant, then we should re-write
that. So no, Travis is not ineligible for the council, as there has been no
council to be active on.

And in that case, my vague memory is that Travis has popped his head into
architecture discussions on this list at least once a year essentially
forever, so there is every indication that he has been and will be active
in that sense.

--- somewhere between 2 and 4 years would be more appropriate.   I suppose
> 1 year of inactivity is fine if that is defined only as "failure to vote on
> matters before the council"
>

exactly. -- I wld expand "activity" to mean more than voting, but that's
the idea. More like active participation in discussions / issues, and/or
attending meetings, if there are any.


> 2) The seed council should not just be recent contributors but should
> include as many people as are willing to help who have a long history with
> the project.
>

again, I don't think we need so much emphasis in the "seed council" that is
only needed to have SOME way to start. Though I guess as long as we're
having the discussion, we should have an outline of the expectations for
the council in general, which should indeed include a couple members with
long history.


> 3) I think people who contribute significantly generally should be able to
> re-join the steering council more easily than "going through the 1-year
> vetting process" again --- they would have to be approved by the current
> steering council but it should not be automatically disallowed (thus
> requiring the equivalent of an amendment to change it).
>

Sure -- give the Council Flexibility to keep the council current. However,
the idea here is that we want some continuity -- not have people popping on
and off the council on short time spans as their schedules allow.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Nathan Goldbaum
On Wed, Sep 30, 2015 at 3:20 AM, Juha Jeronen  wrote:

> Hi,
>
> On 30.09.2015 03:48, Chris Barker - NOAA Federal wrote:
>
>> This sounds pretty cool -- and I've had a use case. So it would be
>> nice to get into Numpy.
>>
>> But: I doubt we'd want OpenMP dependence in Numpy itself.
>>
>
> It is indeed a good point not to add new dependencies for a small feature
> such as this. From a software engineering point of view, I fully agree.
>
> Note however that with the current version of the code, not having OpenMP
> will severely limit the performance, especially on quad-core machines,
> since an important factor in the speed comes from the parallel processing
> of the independent problem instances.
>
> But I suppose there may be another technical option to support multiple
> cores (doesn't NumPy already do this in some of its routines?). OpenMP was
> just the most convenient solution from the implementation viewpoint.
>
>
> But maybe a pure Cython non-MP version?
>>
>
> That is of course possible. IIRC, changing the processing to occur on a
> single core should only require replacing Cython.parallel.prange() with
> range() in the array processing loops.
>
> (Note range(), not xrange(), even for Python 2.x - Cython compiles a loop
> using range() into an efficient C loop if some simple compile-time
> conditions are fulfilled.)
>
>
> Are we putting Cuthon in Numpy now? I lost track.
>>
>
> I thought so? But then again, I'm just a user :)


For what it's worth (no idea what the order of magnitude of the technical
risk is for something like this in a library like numpy), it's actually
quite simple to dynamically test for OpenMP support at install time.

Here's what we do in yt:

https://bitbucket.org/yt_analysis/yt/src/f8934e13abaf349f58c52c076320d44ee4f61a7f/yt/utilities/lib/setup.py?at=yt&fileviewer=file-view-default#setup.py-8

Basically, just try to compile a simple OpenMP test program in a
subprocess. If that succeeds, then great, we can add -fopenmp as a
compilation flag. If not, don't do that.

This was implemented when Apple switched from GCC 4.2 to LLVM/Clang in OS X
10.7. We've had exactly one bug report about this, and it wasn't even
related to OpenMP, instead it was an issue with the way we were using
subprocess which broke when someone had set "CC = ccache gcc".

Hope that's helpful,

Nathan Goldbaum


>
>
>
>  -J
>
> -
> Juha Jeronen, Ph.D.
> juha.jero...@jyu.fi
> University of Jyväskylä
> Department of Mathematical Information Technology
> -
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Pauli Virtanen
Juha Jeronen  jyu.fi> writes:
> I recently developed a Cython-based, OpenMP-accelerated quartic (and 
> cubic, quadratic) polynomial solver to address a personal research need 
> for quickly solving a very large number of independent low-degree 
> polynomial equations for both real and complex coefficients.

My 2c in this context would be to also think how this best fits
in how collections of polynomials are represented in Numpy.

AFAICS, Numpy's polynomial module supports evaluation of polynomial
collections (cf. numpy.polynomial.polynomial.polyval), 
but the corresponding root finding routine
(numpy.polynomial.polynomial.polyroots) only deals with one
polynomial at a time.

The present code in principle could be used to speed up the latter
routine after it is generalized to multiple polynomials. The general
case is probably doable just by coding up the companion matrix
approach using low-level routines (or maybe with the new vectorized
np.linalg routines).

Some relevant code elsewhere for similar purposes can be
found in scipy.inteprolate.PPoly/BPoly (and in the future BSpline).

https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.PPoly.html

However, since it's piecewise, there's purposefully support only
for real-valued roots.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

Hi,

What qualifies as an NR part?

(See my previous message concerning the references; NR is not the only 
source which has these algorithms. Again, the code itself is original to 
this solver, I haven't even looked at the code provided with NR.)


 -J


-
Juha Jeronen, Ph.D.
juha.jero...@jyu.fi
University of Jyväskylä
Department of Mathematical Information Technology
-


On 30.09.2015 12:35, Matthieu Brucher wrote:

Yes, obviously, the code has NR parts, so it can't be licensed as BSD
as it is...

Matthieu

2015-09-30 2:37 GMT+01:00 Charles R Harris :


On Tue, Sep 29, 2015 at 6:48 PM, Chris Barker - NOAA Federal
 wrote:

This sounds pretty cool -- and I've had a use case. So it would be
nice to get into Numpy.

But: I doubt we'd want OpenMP dependence in Numpy itself.

But maybe a pure Cython non-MP version?

Are we putting Cuthon in Numpy now? I lost track.


Yes, but we need to be careful of Numeric Recipes.



Chuck

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Matthieu Brucher
Yes, obviously, the code has NR parts, so it can't be licensed as BSD
as it is...

Matthieu

2015-09-30 2:37 GMT+01:00 Charles R Harris :
>
>
> On Tue, Sep 29, 2015 at 6:48 PM, Chris Barker - NOAA Federal
>  wrote:
>>
>> This sounds pretty cool -- and I've had a use case. So it would be
>> nice to get into Numpy.
>>
>> But: I doubt we'd want OpenMP dependence in Numpy itself.
>>
>> But maybe a pure Cython non-MP version?
>>
>> Are we putting Cuthon in Numpy now? I lost track.
>
>
> Yes, but we need to be careful of Numeric Recipes.
>
> 
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Daπid
On 30 September 2015 at 10:20, Juha Jeronen  wrote:

>
> Are we putting Cuthon in Numpy now? I lost track.
>>
>
> I thought so? But then again, I'm just a user :)


As of this moment, there are three Cython modules in Numpy, so yes.
Releases ship the C generated modules, so it is not a dependency.

./numpy/distutils/tests/pyrex_ext/primes.pyx
./numpy/random/mtrand/mtrand.pyx
./tools/allocation_tracking/alloc_hook.pyx

In any case, we should always include a single threaded version because
sometimes the computations are already parallelised at a higher level.

Is there a nice way to ship both versions? After all, most implementations
of BLAS and friends do spawn OpenMP threads, so I don't think it would be
outrageous to take advantage of it in more places; provided there is a nice
way to fallback to a serial version when it is not available.


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

Hi,

On 30.09.2015 04:37, Charles R Harris wrote:



On Tue, Sep 29, 2015 at 6:48 PM, Chris Barker - NOAA Federal 
mailto:chris.bar...@noaa.gov>> wrote:


This sounds pretty cool -- and I've had a use case. So it would be
nice to get into Numpy.

But: I doubt we'd want OpenMP dependence in Numpy itself.

But maybe a pure Cython non-MP version?

Are we putting Cuthon in Numpy now? I lost track.


Yes, but we need to be careful of Numeric Recipes.


True.

As I said, only the algorithms come from NR, and the code was written 
from scratch. I haven't even looked at any of their code, precisely due 
to the restrictive license.



In this particular case, the referred pages (pp. 227-229, 3rd ed.) 
contain only a description of the solution process in English, with 
equations - there is no example code printed into the book in section 
5.6, Quadratic and Cubic Equations, which in its entirety is contained 
on pp. 227-229.


Furthermore, on p. 228, equation (5.6.12), which contains the solutions 
of the cubic equation, is attributed to Francois Viete. The reference 
given is the treatise "De emendatione", published in 1615.


The same solution, also with attribution to Francois Viete, is given in 
Wikipedia, but without especially defining temporary variables to 
eliminate some common subexpressions:


https://en.wikipedia.org/wiki/Cubic_function#Trigonometric_.28and_hyperbolic.29_method


The solution to the quadratic equation is the standard one, but with 
special care taken to avoid catastrophic cancellation in computing the 
other root.


Wikipedia mentions that in the standard formula, this issue exists:

https://en.wikipedia.org/wiki/Quadratic_equation#Avoiding_loss_of_significance

but does not give an explicit remedy. References given are

Kahan, Willian (November 20, 2004), On the Cost of Floating-Point 
Computation Without Extra-Precise Arithmetic. ( 
http://www.cs.berkeley.edu/~wkahan/Qdrtcs.pdf , see esp. p. 8, which 
provides the same remedy as NR )


Higham, Nicholas (2002), Accuracy and Stability of Numerical Algorithms 
(2nd ed.), SIAM, p. 10, ISBN 978-0-89871-521-7.



(Browsing through Kahan's text, I see now that I could improve the 
discriminant computation to handle marginal cases better. Might do this 
in a future version.)



For both of these cases, I've used NR as a reference, because the 
authors have taken special care to choose only numerically stable 
approaches - which is absolutely critical, yet something that sadly not 
all mathematics texts care about.



The quartic equation is not treated in NR. The reduction trick is 
reported in Wikipedia:


https://en.wikipedia.org/wiki/Quartic_function#Solving_by_factoring_into_quadratics

where the original reference given is

Brookfield, G. (2007). "Factoring quartic polynomials: A lost art". 
Mathematics Magazine 80 (1): 67–70.


(This seemed an elegant approach, given stable solvers for cubics and 
quadratics.)



 -J

-
Juha Jeronen, Ph.D.
juha.jero...@jyu.fi
University of Jyväskylä
Department of Mathematical Information Technology
-

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Juha Jeronen

Hi,

On 30.09.2015 03:48, Chris Barker - NOAA Federal wrote:

This sounds pretty cool -- and I've had a use case. So it would be
nice to get into Numpy.

But: I doubt we'd want OpenMP dependence in Numpy itself.


It is indeed a good point not to add new dependencies for a small 
feature such as this. From a software engineering point of view, I fully 
agree.


Note however that with the current version of the code, not having 
OpenMP will severely limit the performance, especially on quad-core 
machines, since an important factor in the speed comes from the parallel 
processing of the independent problem instances.


But I suppose there may be another technical option to support multiple 
cores (doesn't NumPy already do this in some of its routines?). OpenMP 
was just the most convenient solution from the implementation viewpoint.




But maybe a pure Cython non-MP version?


That is of course possible. IIRC, changing the processing to occur on a 
single core should only require replacing Cython.parallel.prange() with 
range() in the array processing loops.


(Note range(), not xrange(), even for Python 2.x - Cython compiles a 
loop using range() into an efficient C loop if some simple compile-time 
conditions are fulfilled.)




Are we putting Cuthon in Numpy now? I lost track.


I thought so? But then again, I'm just a user :)


 -J

-
Juha Jeronen, Ph.D.
juha.jero...@jyu.fi
University of Jyväskylä
Department of Mathematical Information Technology
-

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sign of NaN

2015-09-30 Thread Sebastian Berg
On Mi, 2015-09-30 at 00:01 -0700, Nathaniel Smith wrote:
> On Tue, Sep 29, 2015 at 2:07 PM, Sebastian Berg
>  wrote:
> > On Di, 2015-09-29 at 11:16 -0700, Nathaniel Smith wrote:
> [...]
> >> In general I'm not a big fan of trying to do all kinds of guessing
> >> about how to handle random objects in object arrays, the kind that
> >> ends up with a big chain of type checks and fallback behaviors. Pretty
> >> soon we find ourselves trying to extend the language with our own
> >> generic dispatch system for arbitrary python types, just for object
> >> arrays. (The current hack where for object arrays np.log will try
> >> calling obj.log() is particularly horrible. There is no rule in python
> >> that "log" is a reserved method name for "logarithm" on arbitrary
> >> objects. Ditto for the other ufuncs that implement this hack.)
> >>
> >> Plus we hope that many use cases for object arrays will soon be
> >> supplanted by better dtype support, so now may not be the best time to
> >> invest heavily in making object arrays complicated and powerful.
> >>
> >
> > I have the little dream here that what could happen is that we create a
> > PyFloatDtype kind of thing (it is a bit different from our float because
> > it would always convert back to a python float and maybe raises more
> > errors), which "registers" with the dtype system in that it says "I know
> > how to handle python floats and store them in an array and provide ufunc
> > implementations for it".
> >
> > Then, the "object" dtype ufuncs would try to call the ufunc on each
> > element, including "conversion". They would find a "float", since it is
> > not an array-like container, they interpret it as a PyFloatDtype scalar
> > and call the scalars ufunc (the PyFloatDtype scalar would be a python
> > float).
> 
> I'm not sure I understand this, but it did make me think of one
> possible approach --
> 
> in my notebook sketches for what the New and Improved ufunc API might
> look like, I was already pondering whether the inner loop should
> receive a pointer to the ufunc object itself. Not for any reason in
> particular, but just because hey they're sorta vaguely like methods
> and methods get pointers to the object. But now I know what this is
> useful for :-).
> 
> If ufunc loops get a pointer to the ufunc object itself, then we can
> define a single inner loop function that looks like (sorta-Cython
> code):
> 
> cdef generic_object_inner_loop(ufunc, args, strides, n, ...):
> for i in range(n):
> arg_objs = []
> for i in range(ufunc.narg):
> args_objs.append( (args[j] + strides[j] * i))
> ufunc(*arg_objs[:ufunc.nin], out=arg_objs[ufunc.nin:])
> 
> and register it by default in every ufunc with signature
> "{}->{}".format("O" * ufunc.nin, "O" * ufunc.nout). And this would in
> just a few lines of code provide a pretty sensible generic behavior
> for *all* object array ufuncs -- they recursively call the ufunc on
> their contents.
> 
> As a prerequisite of course we would need to remove the auto-coercion
> of unknown objects to object arrays, otherwise this becomes an
> infinite recursion. But we already decided to do that.
> 
> And for this to be really useful for arbitrary objects, not just the
> ones that asarray recognizes, then we need __numpy_ufunc__. But again,
> we already decided to do that :-).
> 

Well, what I mean is. A `Decimal` will probably never know about numpy
itself. So I was wondering if you should teach numpy the other way
around about it.
I.e. you would create an object which has all the information about
ufuncs and casting for Decimal and register it with numpy. Then when
numpy sees a Decimal (also in `asarray` it would know what to do with
them, how to store them in an array, etc. The `Decimal` object would be
the scalar version of an array of Decimals.
By the way, in some way an array is a "Scalar" as well, it can be put
into another array and if you apply the ufunc to it, it applies the
ufunc to all its elements.

This all is likely too complicated though, maybe it is better to just
force the user to subclass the Decimal to achieve this. I am sure there
are quite a few roads we could go and we just need to think about it
some more about what we want and what we can do. :)

- Sebastian


> -n
> 



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Sign of NaN

2015-09-30 Thread Nathaniel Smith
On Tue, Sep 29, 2015 at 2:07 PM, Sebastian Berg
 wrote:
> On Di, 2015-09-29 at 11:16 -0700, Nathaniel Smith wrote:
[...]
>> In general I'm not a big fan of trying to do all kinds of guessing
>> about how to handle random objects in object arrays, the kind that
>> ends up with a big chain of type checks and fallback behaviors. Pretty
>> soon we find ourselves trying to extend the language with our own
>> generic dispatch system for arbitrary python types, just for object
>> arrays. (The current hack where for object arrays np.log will try
>> calling obj.log() is particularly horrible. There is no rule in python
>> that "log" is a reserved method name for "logarithm" on arbitrary
>> objects. Ditto for the other ufuncs that implement this hack.)
>>
>> Plus we hope that many use cases for object arrays will soon be
>> supplanted by better dtype support, so now may not be the best time to
>> invest heavily in making object arrays complicated and powerful.
>>
>
> I have the little dream here that what could happen is that we create a
> PyFloatDtype kind of thing (it is a bit different from our float because
> it would always convert back to a python float and maybe raises more
> errors), which "registers" with the dtype system in that it says "I know
> how to handle python floats and store them in an array and provide ufunc
> implementations for it".
>
> Then, the "object" dtype ufuncs would try to call the ufunc on each
> element, including "conversion". They would find a "float", since it is
> not an array-like container, they interpret it as a PyFloatDtype scalar
> and call the scalars ufunc (the PyFloatDtype scalar would be a python
> float).

I'm not sure I understand this, but it did make me think of one
possible approach --

in my notebook sketches for what the New and Improved ufunc API might
look like, I was already pondering whether the inner loop should
receive a pointer to the ufunc object itself. Not for any reason in
particular, but just because hey they're sorta vaguely like methods
and methods get pointers to the object. But now I know what this is
useful for :-).

If ufunc loops get a pointer to the ufunc object itself, then we can
define a single inner loop function that looks like (sorta-Cython
code):

cdef generic_object_inner_loop(ufunc, args, strides, n, ...):
for i in range(n):
arg_objs = []
for i in range(ufunc.narg):
args_objs.append( (args[j] + strides[j] * i))
ufunc(*arg_objs[:ufunc.nin], out=arg_objs[ufunc.nin:])

and register it by default in every ufunc with signature
"{}->{}".format("O" * ufunc.nin, "O" * ufunc.nout). And this would in
just a few lines of code provide a pretty sensible generic behavior
for *all* object array ufuncs -- they recursively call the ufunc on
their contents.

As a prerequisite of course we would need to remove the auto-coercion
of unknown objects to object arrays, otherwise this becomes an
infinite recursion. But we already decided to do that.

And for this to be really useful for arbitrary objects, not just the
ones that asarray recognizes, then we need __numpy_ufunc__. But again,
we already decided to do that :-).

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion