On Fri, Feb 17, 2012 at 12:09 PM, Benjamin Root <ben.r...@ou.edu> wrote:
> > > On Fri, Feb 17, 2012 at 1:00 PM, Christopher Jordan-Squire < > cjord...@uw.edu> wrote: > >> On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <mwwi...@gmail.com> wrote: >> > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efir...@hawaii.edu> >> wrote: >> >> >> >> On 02/17/2012 05:39 AM, Charles R Harris wrote: >> >> > >> >> > >> >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau < >> courn...@gmail.com >> >> > <mailto:courn...@gmail.com>> wrote: >> >> > >> >> > Hi Travis, >> >> > >> >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant >> >> > <tra...@continuum.io <mailto:tra...@continuum.io>> wrote: >> >> > > Mark Wiebe and I have been discussing off and on (as well as >> >> > talking with Charles) a good way forward to balance two competing >> >> > desires: >> >> > > >> >> > > * addition of new features that are needed in NumPy >> >> > > * improving the code-base generally and moving towards >> a >> >> > more maintainable NumPy >> >> > > >> >> > > I know there are load voices for just focusing on the second >> of >> >> > these and avoiding the first until we have finished that. I >> >> > recognize the need to improve the code base, but I will also be >> >> > pushing for improvements to the feature-set and user experience >> in >> >> > the process. >> >> > > >> >> > > As a result, I am proposing a rough outline for releases over >> the >> >> > next year: >> >> > > >> >> > > * NumPy 1.7 to come out as soon as the serious bugs >> can be >> >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage >> >> > some of those. >> >> > > >> >> > > * NumPy 1.8 to come out in July which will have as many >> >> > ABI-compatible feature enhancements as we can add while improving >> >> > test coverage and code cleanup. I will post to this list more >> >> > details of what we plan to address with it later. Included for >> >> > possible inclusion are: >> >> > > * resolving the NA/missing-data issues >> >> > > * finishing group-by >> >> > > * incorporating the start of label arrays >> >> > > * incorporating a meta-object >> >> > > * a few new dtypes (variable-length string, >> >> > varialbe-length unicode and an enum type) >> >> > > * adding ufunc support for flexible dtypes and possibly >> >> > structured arrays >> >> > > * allowing generalized ufuncs to work on more kinds of >> >> > arrays besides just contiguous >> >> > > * improving the ability for NumPy to receive >> JIT-generated >> >> > function pointers for ufuncs and other calculation opportunities >> >> > > * adding "filters" to Input and Output >> >> > > * simple computed fields for dtypes >> >> > > * accepting a Data-Type specification as a class or >> JSON >> >> > file >> >> > > * work towards improving the dtype-addition mechanism >> >> > > * re-factoring of code so that it can compile with a >> C++ >> >> > compiler and be minimally dependent on Python data-structures. >> >> > >> >> > This is a pretty exciting list of features. What is the rationale >> >> > for >> >> > code being compiled as C++ ? IMO, it will be difficult to do so >> >> > without preventing useful C constructs, and without removing >> some of >> >> > the existing features (like our use of C99 complex). The subset >> that >> >> > is both C and C++ compatible is quite constraining. >> >> > >> >> > >> >> > I'm in favor of this myself, C++ would allow a lot code cleanup and >> make >> >> > it easier to provide an extensible base, I think it would be a >> natural >> >> > fit with numpy. Of course, some C++ projects become tangled messes of >> >> > inheritance, but I'd be very interested in seeing what a good C++ >> >> > designer like Mark, intimately familiar with the numpy code base, >> could >> >> > do. This opportunity might not come by again anytime soon and I >> think we >> >> > should grab onto it. The initial step would be a release whose code >> that >> >> > would compile in both C/C++, which mostly comes down to removing C++ >> >> > keywords like 'new'. >> >> > >> >> > I did suggest running it by you for build issues, so please raise any >> >> > you can think of. Note that MatPlotLib is in C++, so I don't think >> the >> >> > problems are insurmountable. And choosing a set of compilers to >> support >> >> > is something that will need to be done. >> >> >> >> It's true that matplotlib relies heavily on C++, both via the Agg >> >> library and in its own extension code. Personally, I don't like this; >> I >> >> think it raises the barrier to contributing. C++ is an order of >> >> magnitude more complicated than C--harder to read, and much harder to >> >> write, unless one is a true expert. In mpl it brings reliance on the >> CXX >> >> library, which Mike D. has had to help maintain. And if it does >> >> increase compiler specificity, that's bad. >> > >> > >> > This gets to the recruitment issue, which is one of the most important >> > problems I see numpy facing. I personally have contributed a lot of >> code to >> > NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ >> was >> > the biggest negative point when I considered whether it was worth >> > contributing to the project. I suspect there are many programmers out >> there >> > who are skilled in low-level, high-performance C++, who would be >> willing to >> > contribute, but don't want to code in C. >> > >> > I believe NumPy should be trying to find people who want to make high >> > performance, close to the metal, libraries. This is a very different >> type of >> > programmer than one who wants to program in Python, but is willing to >> dabble >> > in a lower level language to make something run faster. High performance >> > library development is one of the things the C++ developer community >> does >> > very well, and that community is where we have a good chance of finding >> the >> > programmers NumPy needs. >> > >> >> I would much rather see development in the direction of sticking with C >> >> where direct low-level control and speed are needed, and using cython >> to >> >> gain higher level language benefits where appropriate. Of course, that >> >> brings in the danger of reliance on another complex tool, cython. If >> >> that danger is considered excessive, then just stick with C. >> > >> > >> > There are many small benefits C++ can offer, even if numpy chooses only >> to >> > use a tiny subset of the C++ language. For example, RAII can be used to >> > reliably eliminate PyObject reference leaks. >> > >> > Consider a regression like this: >> > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html >> > >> > Fixing this in C would require switching all the relevant usages of >> > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the >> > potential of easily introducing a memory leak, and is a lot of work to >> do. >> > In C++, this functionality could be placed inside a class, where the >> > deterministic construction/destruction semantics eliminate the risk of >> > memory leaks and make the code easier to read at the same time. There >> are >> > other examples like this where the C language has forced a suboptimal >> design >> > choice because of how hard it would be to do it better. >> > >> > Cheers, >> > Mark >> > >> >> In a similar vein, could incorporating C++ lead to a simpler low-level >> API for numpy? I know Mark has talked before about--in the long-term, >> as a dream project to scratch his own itch, and something the BDF12 >> doesn't necessarily agree with--implementing the great ideas in numpy >> as a layered C++ library. (Which would have the added benefit of >> making numpy more of a general array library that could be exposed to >> any language which can call C++ libraries.) >> >> I don't imagine that's on the table for anything near-term, but I >> wonder if making more of the low-level stuff C++ would make it easier >> for performance nuts to write their own code in C/C++ interfacing with >> numpy, and then expose it to python. After playing around with ufuncs >> at the C level for a little while last summer, I quickly realized any >> simplifications would be greatly appreciated. >> >> -Chris >> >> >> > I am also in favor of moving towards a C++ oriented library. Personally, > I find C++ easier to read and understand, most likely because I learned it > first. I only learned C in the context of learning C++. > > Just a thought, with the upcoming revisions to the C++ standard, this does > open up the possibility of some nice templating features that would make > the library easier to use in native C++ programs. On a side note, does > anybody use std::valarray? > > My impression is that std::valarray didn't really solve the problems it was intended to solve. IIRC, the valarray author himself said as much, but I don't recall where. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion