On 10/08/2015 03:30 PM, David Cournapeau wrote:


On Tue, Oct 6, 2015 at 8:04 PM, Nathaniel Smith <n...@pobox.com
<mailto:n...@pobox.com>> wrote:

    On Tue, Oct 6, 2015 at 11:52 AM, David Cournapeau
    <courn...@gmail.com <mailto:courn...@gmail.com>> wrote:
     >
     >
     > On Tue, Oct 6, 2015 at 7:30 PM, Nathaniel Smith <n...@pobox.com
    <mailto:n...@pobox.com>> wrote:
     >>
     >> [splitting this off into a new thread]
     >>
     >> On Tue, Oct 6, 2015 at 3:00 AM, David Cournapeau
    <courn...@gmail.com <mailto:courn...@gmail.com>>
     >> wrote:
     >> [...]
     >> > I also agree the current situation is not sustainable -- as we
    discussed
     >> > privately before, cythonizing numpy.core is made quite more
    complicated
     >> > by
     >> > this. I have myself quite a few issues w/ cythonizing the
    other parts of
     >> > umath. I would also like to support the static link better
    than we do
     >> > now
     >> > (do we know some static link users we can contact to validate our
     >> > approach
     >> > ?)
     >> >
     >> > Currently, what we have in numpy core is the following:
     >> >
     >> > numpy.core.multiarray -> compilation units in
    numpy/core/src/multiarray/
     >> > +
     >> > statically link npymath
     >> > numpy.core.umath -> compilation units in numpy/core/src/umath +
     >> > statically
     >> > link npymath/npysort + some shenanigans to use things in
     >> > numpy.core.multiarray
     >>
     >> There are also shenanigans in the other direction - supposedly umath
     >> is layered "above" multiarray, but in practice there are circular
     >> dependencies (see e.g. np.set_numeric_ops).
     >
     > Indeed, I am not arguing about merging umath and multiarray.

    Oh, okay :-).

    >> > I would suggest to have a more layered approach, to enable both 
'normal'
    >> > build and static build, without polluting the public namespace too 
much.
    >> > This is an approach followed by most large libraries (e.g. MKL), and is
    >> > fairly flexible.
    >> >
    >> > Concretely, we could start by putting more common functionalities (aka
    >> > the
    >> > 'core' library) into its own static library. The API would be 
considered
    >> > private to numpy (no stability guaranteed outside numpy), and every
    >> > exported
    >> > symbol from that library would be decorated appropriately to avoid
    >> > potential
    >> > clashes (e.g. '_npy_internal_').
    >>
    >> I don't see why we need this multi-layered complexity, though.
    >
    >
    > For several reasons:
    >
    >  - when you want to cythonize either extension, it is much easier to
    > separate it as cython for CPython API, C for the rest.

    I don't think this will help much, because I think we'll want to have
    multiple cython files, and that we'll probably move individual
    functions between being implemented in C and Cython (including utility
    functions). So that means we need to solve the problem of mixing C and
    Cython files inside a single library.


Separating the pure C code into static lib is the simple way of
achieving the same goal. Essentially, you write:

# implemented in npyinternal.a
_npy_internal_foo(....)

# implemented in merged_multiarray_umath.pyx
cdef PyArray_Foo(...):
     # use _npy_internal_foo()

then our merged_multiarray_umath.so is built by linking the .pyx and the
npyinternal.a together. IOW, the static link is internal.

Going through npyinternal.a instead of just linking .o from pure C and
Cython together gives us the following:

  1. the .a can just use normal linking strategies instead of the
awkward capsule thing. Those are easy to get wrong when using cython as
you may end up with multiple internal copies of the wrapped object
inside capsule, causing hard to track bugs (this is what we wasted most
of the time on w/ Stefan and Kurt during ds4ds)
  2. the only public symbols in .a are the ones needed by the cython
wrapping, and since those are decorated with npy_internal, clashes are
unlikely to happen
  3. since most of the code is already in .a internally, supporting the
static linking should be simpler since the only difference is how you
statically link the cython-generated code. Because of 1, you are also
less likely to cause nasty surprises when putting everything together.



I don't see why static libraries for internals are discussed at all?
There is not much difference between an .a (archive) file and an .o (object) file. What you call a static library is just a collection of object files with an index slapped on top for faster lookup. Whether a symbol is exported or not is defined in the object file, not the archive file, so in this regard static library of collection of .o files makes no difference. So our current system also produces a library, the only thing thats "missing" is bundling it into an archive via ar cru *.o

I also don't see how pycapsule plays a role in this. You don't need pycapsule to link a bunch of object files together.

So for me the issue is simply, what is easier with distutils:
get the list of object files to link against the cython file or first create a static library from the list of object files and link that against the cython object. I don't think either way should be particular hard. So there is not really much to discuss. Do whatever is easier or results in nicer code.


As for adding cython to numpy, I'd start with letting a cython file provide the multiarraymodule init function with all regular numpy object files linked into that thing. Then we have a pyx file with minimal bloat to get started and should also be independent of merging umath (which I'm in favour for). When that single pyx module file gets too large probably concatenating multiple files together could work until cython supports a splut util/user-code build.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to