subject:"Re\: \[Numpy\-discussion\] Proposed Roadmap Overview"

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-03-01 Thread Richard Hattersley

+1 on the NEP guideline

As part of a team building a scientific analysis library, I'm
attempting to understand the current state of NumPy development and
its likely future (with a view to contributing if appropriate). The
proposed NEP process would make that a whole lot easier. And if
nothing else, it would reduce the chance of me posting questions about
topics that had already been discussed/decided!

Without the process the NEPs become another potential source of
confusion and mixed messages.


On 1 March 2012 03:02, Travis Oliphant wrote:
> I Would like to hear the opinions of others on that point,

> but yes,  I think that is an appropriate procedure.

>

> Travis

>

> --

> Travis Oliphant

> (on a mobile)

> 512-826-7480

>

>

> On Feb 29, 2012, at 10:54 AM, Matthew Brett

>  wrote:

>

> > Hi,

> >

> > On Wed, Feb 29, 2012 at 1:46 AM, Travis Oliphant

>  wrote:

> >> We already use the NEP process for such decisions.   This

> discussion came from simply from the *idea* of writing such a NEP.

> >>

> >> Nothing has been decided.  Only opinions have been shared

> that might influence the NEP.  This is all pretty premature,

> though ---  migration to C++ features on a trial branch is

> some months away were it to happen.

> >

> > Fernando can correct me if I'm wrong, but I think he was asking a

> > governance question.   That is: would you (as BDF$N) consider the

> > following guideline:

> >

> > "As a condition for accepting significant changes to Numpy, for each

> > significant change, there will be a NEP.  The NEP shall follow the

> > same model as the Python PEPs - that is - there will be a summary of

> > the changes, the issues arising, the for / against opinions and

> > alternatives offered.  There will usually be a draft implementation.

> > The NEP will contain the resolution of the discussion as it

> relates to

> > the code"

> >

> > For example, the masked array NEP, although very

> substantial, contains

> > little discussion of the controversy arising, or the intended

> > resolution of the controversy:

> >

> >

> https://github.com/numpy/numpy/blob/3f685a1a990f7b6e5149c80b52

> 436fb4207e49f5/doc/neps/missing-data.rst

> >

> > I mean, although it is useful, it is not in the form of a PEP, as

> > Fernando has described it.

> >

> > Would you accept extending the guidelines to the NEP format?

> >

> > Best,

> >

> > Matthew

> > ___

> > NumPy-Discussion mailing list

> > NumPy-Discussion@scipy.org

> > http://mail.scipy.org/mailman/listinfo/numpy-discussion

> ___

> NumPy-Discussion mailing list

> NumPy-Discussion@scipy.org

> http://mail.scipy.org/mailman/listinfo/numpy-discussion

>

>

>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-29 Thread Travis Oliphant

I Would like to hear the opinions of others on that point,  but yes,  I think 
that is an appropriate procedure. 

Travis 

--
Travis Oliphant
(on a mobile)
512-826-7480


On Feb 29, 2012, at 10:54 AM, Matthew Brett  wrote:

> Hi,
> 
> On Wed, Feb 29, 2012 at 1:46 AM, Travis Oliphant  wrote:
>> We already use the NEP process for such decisions.   This discussion came 
>> from simply from the *idea* of writing such a NEP.
>> 
>> Nothing has been decided.  Only opinions have been shared that might 
>> influence the NEP.  This is all pretty premature, though ---  migration to 
>> C++ features on a trial branch is some months away were it to happen.
> 
> Fernando can correct me if I'm wrong, but I think he was asking a
> governance question.   That is: would you (as BDF$N) consider the
> following guideline:
> 
> "As a condition for accepting significant changes to Numpy, for each
> significant change, there will be a NEP.  The NEP shall follow the
> same model as the Python PEPs - that is - there will be a summary of
> the changes, the issues arising, the for / against opinions and
> alternatives offered.  There will usually be a draft implementation.
> The NEP will contain the resolution of the discussion as it relates to
> the code"
> 
> For example, the masked array NEP, although very substantial, contains
> little discussion of the controversy arising, or the intended
> resolution of the controversy:
> 
> https://github.com/numpy/numpy/blob/3f685a1a990f7b6e5149c80b52436fb4207e49f5/doc/neps/missing-data.rst
> 
> I mean, although it is useful, it is not in the form of a PEP, as
> Fernando has described it.
> 
> Would you accept extending the guidelines to the NEP format?
> 
> Best,
> 
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-29 Thread John Hunter

On Wed, Feb 29, 2012 at 1:20 PM, Neal Becker  wrote:

>
> Much of Linus's complaints have to do with the use of c++ in the _kernel_.
> These objections are quite different for an _application_.  For example,
> there
> are issues with the need for support libraries for exception handling.
>  Not an
> issue for an application.
>
> Actually, the thread was on the git mailing list, and many of
his complaints were addressing the appropriateness of C++ for git
development.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-29 Thread Neal Becker

Charles R Harris wrote:

> On Tue, Feb 28, 2012 at 12:05 PM, John Hunter  wrote:
> 
>> On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau wrote:
>>
>>>
>>> There are better languages than C++ that has most of the technical
>>>
>>> benefits stated in this discussion (rust and D being the most
>>> "obvious" ones), but whose usage is unrealistic today for various
>>> reasons: knowledge, availability on "esoteric" platforms, etc… A new
>>> language is completely ridiculous.
>>>
>>
>>
>> I just saw this for the first time today: Linus Torvalds on C++ (
>> http://harmful.cat-v.org/software/c++/linus).  The post is from 2007 so
>> many of you may have seen it, but I thought it was entertainng enough and
>> on-topic enough with this thread that I'd share it in case you haven't.
>>
>>
>> The point he makes:
>>
>>   In other words, the only way to do good, efficient, and system-level and
>>   portable C++ ends up to limit yourself to all the things that
>> are basically
>>   available in C
>>
>> was interesting to me because the best C++ library I have ever worked with
>> (agg) imports *nothing* except standard C libs (no standard template
>> library).  In fact, the only includes external to external to itself
>> are math.h, stdlib.h, stdio.h, and string.h.
>>
>> To shoehorn Jamie Zawinski's famous regex quote (
>> http://regex.info/blog/2006-09-15/247).  "Some people, when confronted
>> with a problem, think “I know, I'll use boost.”   Now they have two
>> problems."
>>
>> Here is the Linus post:
>>
>> From: Linus Torvalds  linux-foundation.org>
>> Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String
>> Library.
>> Newsgroups: gmane.comp.version-control.git
>> Date: 2007-09-06 17:50:28 GMT (2 years, 14 weeks, 16 hours and 36 minutes
>> ago)
>>
>> On Wed, 5 Sep 2007, Dmitry Kakurin wrote:
>> >
>> > When I first looked at Git source code two things struck me as odd:
>> > 1. Pure C as opposed to C++. No idea why. Please don't talk about
>> portability,
>> > it's BS.
>>
>> *YOU* are full of bullshit.
>>
>> C++ is a horrible language. It's made more horrible by the fact that a lot
>> of substandard programmers use it, to the point where it's much much
>> easier to generate total and utter crap with it. Quite frankly, even if
>> the choice of C were to do *nothing* but keep the C++ programmers out,
>> that in itself would be a huge reason to use C.
>>
>> In other words: the choice of C is the only sane choice. I know Miles
>> Bader jokingly said "to piss you off", but it's actually true. I've come
>> to the conclusion that any programmer that would prefer the project to be
>> in C++ over C is likely a programmer that I really *would* prefer to piss
>> off, so that he doesn't come and screw up any project I'm involved with.
>>
>> C++ leads to really really bad design choices. You invariably start using
>> the "nice" library features of the language like STL and Boost and other
>> total and utter crap, that may "help" you program, but causes:
>>
>>  - infinite amounts of pain when they don't work (and anybody who tells me
>>that STL and especially Boost are stable and portable is just so full
>>of BS that it's not even funny)
>>
>>  - inefficient abstracted programming models where two years down the road
>>you notice that some abstraction wasn't very efficient, but now all
>>your code depends on all the nice object models around it, and you
>>cannot fix it without rewriting your app.
>>
>> In other words, the only way to do good, efficient, and system-level and
>> portable C++ ends up to limit yourself to all the things that are
>> basically available in C. And limiting your project to C means that people
>> don't screw that up, and also means that you get a lot of programmers that
>> do actually understand low-level issues and don't screw things up with any
>> idiotic "object model" crap.
>>
>> So I'm sorry, but for something like git, where efficiency was a primary
>> objective, the "advantages" of C++ is just a huge mistake. The fact that
>> we also piss off people who cannot see that is just a big additional
>> advantage.
>>
>> If you want a VCS that is written in C++, go play with Monotone. Really.
>> They use a "real database". They use "nice object-oriented libraries".
>> They use "nice C++ abstractions". And quite frankly, as a result of all
>> these design decisions that sound so appealing to some CS people, the end
>> result is a horrible and unmaintainable mess.
>>
>> But I'm sure you'd like it more than git.
>>
>>
> Yeah, Linus doesn't like C++. No doubt that is in part because of the
> attempt to rewrite Linux in C++ back in the early 90's and the resulting
> compiler and portability problems. Linus also writes C like it was his
> native tongue, he likes to work close to the metal, and he'd probably
> prefer it over Python for most problems ;) Things have improved in the
> compiler department, and I think C++ really wasn't much of an improvement
> over C until templates and

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-29 Thread Matthew Brett

Hi,

On Wed, Feb 29, 2012 at 1:46 AM, Travis Oliphant  wrote:
> We already use the NEP process for such decisions.   This discussion came 
> from simply from the *idea* of writing such a NEP.
>
> Nothing has been decided.  Only opinions have been shared that might 
> influence the NEP.  This is all pretty premature, though ---  migration to 
> C++ features on a trial branch is some months away were it to happen.

Fernando can correct me if I'm wrong, but I think he was asking a
governance question.   That is: would you (as BDF$N) consider the
following guideline:

"As a condition for accepting significant changes to Numpy, for each
significant change, there will be a NEP.  The NEP shall follow the
same model as the Python PEPs - that is - there will be a summary of
the changes, the issues arising, the for / against opinions and
alternatives offered.  There will usually be a draft implementation.
The NEP will contain the resolution of the discussion as it relates to
the code"

For example, the masked array NEP, although very substantial, contains
little discussion of the controversy arising, or the intended
resolution of the controversy:

https://github.com/numpy/numpy/blob/3f685a1a990f7b6e5149c80b52436fb4207e49f5/doc/neps/missing-data.rst

I mean, although it is useful, it is not in the form of a PEP, as
Fernando has described it.

Would you accept extending the guidelines to the NEP format?

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-29 Thread Fernando Perez

On Tue, Feb 28, 2012 at 11:28 PM, Mark Wiebe  wrote:
> The development approach I really like is to start with a relatively rough
> NEP, then cycle through feedback, updating the NEP, and implementation.
> Organizing ones thoughts to describe them in a design document can often
> clarify things that are confusing when just looking at code. Feedback from
> the community, both developers and users, can help expose where your
> assumptions are and often lead to insights from subjects you didn't even
> know about. Implementation puts those ideas through the a cold, hard,
> reality check, and can provide a hands-on experience for later rounds of
> feedback.

> This iterative process is most important to emphasize, the design document
> and the code must both evolve together. Stamping a NEP as "final" before
> getting into code is just as bad as jumping into code without writing a
> preliminary design.

Certainly! We're in complete agreement here.  I didn't mean to suggest
(though perhaps I phrased it poorly) that the nep discussion and
implementation phases should be fully disjoint, since I do believe
that implementation and discussion can and should inform each other.


> Github actually has a bug that the RST table of contents is stripped, and
> this makes reading longer NEPS right in the repository uncomfortable. Maybe
> alternatives to a git repository for NEPs should be considered. I reported
> the bug to github, but they told me that was just how they did things.

That's easy to solve, and can be done with a minimum of work in a way
that will make the nep-handling process far eaiser:

- split the neps into their own repo, and make that a repo targeted
for building a website, like we do with the ipython docs for example.

- have a 'nep repo manager' who merges PRs from nep authors quickly.
In practice, nep authors could even be given write access to the repo
while they work on their own nep, I think we can trust people not to
mess around outside their directory.

- the nep repo is source-only, and we have a nep-web repo where the
*built* neps are displayed using the gh-pages mechanism.

With this, we achieve something like what python uses, with a separate
and nicely formatted web version of the neps for easy reading, but in
addition with the fluidity of the github workflow for source
management.

We already have all the pieces for this, so it would be a very easy
job for someone to make it happen (~2 hours at most, would be my guess).

Cheers,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-28 Thread Mark Wiebe

On Tue, Feb 28, 2012 at 11:03 PM, Fernando Perez wrote:

> On Tue, Feb 28, 2012 at 10:46 PM, Travis Oliphant 
> wrote:
> > We already use the NEP process for such decisions.   This discussion
> came from simply from the *idea* of writing such a NEP.
> >
> > Nothing has been decided.  Only opinions have been shared that might
> influence the NEP.  This is all pretty premature, though ---  migration to
> C++ features on a trial branch is some months away were it to happen.
>
> Sure, I know we do have neps, they live in the main numpy repo (which
> btw, I think they should be moved to a standalone repo to make their
> management independent of the core code, but that's an easy and minor
> point we can ignore for now). I was just thinking that this discussion
> is precisely the kind of thing that would be well served by being
> organized in a nep, before even jumping into implementation.
>
> A nep can precisely help organize a discussion where there's enough to
> think about and make decisions *before* effort has gone into
> implementing anything.  It's important not to forget that once someone
> goes far enough down the road of implementing something, this adds
> pressure to turn the implementation into a fait accompli, simply out
> of not wanting to throw work away.
>
> For a decision as binary as 'rewrite the core in C++ or not', it would
> seem to me that organizing the problem in a NEP *before* starting to
> implement something in a trial branch would be precisely the way to
> go, and that it would actually make the decision process and
> discussion easier and more productive.
>

The development approach I really like is to start with a relatively rough
NEP, then cycle through feedback, updating the NEP, and implementation.
Organizing ones thoughts to describe them in a design document can often
clarify things that are confusing when just looking at code. Feedback from
the community, both developers and users, can help expose where your
assumptions are and often lead to insights from subjects you didn't even
know about. Implementation puts those ideas through the a cold, hard,
reality check, and can provide a hands-on experience for later rounds of
feedback.

This iterative process is most important to emphasize, the design document
and the code must both evolve together. Stamping a NEP as "final" before
getting into code is just as bad as jumping into code without writing a
preliminary design.

For the decision about adopting C++, a NEP proposing how we would go about
doing it, which evolves as the community gains experience with the idea,
will be very helpful. I would emphasize that the adoption of C++ does not
require a rewrite. The patch required to make NumPy build with a C++
compiler is very small, and individual features of C++ can be adopted
slowly, in a piecemeal fashion. What I'm advocating for is this kind of
gradual evolution, and my starting point for writing a NEP would be the
email I wrote here:

http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060778.html

Github actually has a bug that the RST table of contents is stripped, and
this makes reading longer NEPS right in the repository uncomfortable. Maybe
alternatives to a git repository for NEPs should be considered. I reported
the bug to github, but they told me that was just how they did things.

Cheers,
Mark

>
> Cheers,
>
> f
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-28 Thread Fernando Perez

On Tue, Feb 28, 2012 at 10:46 PM, Travis Oliphant  wrote:
> We already use the NEP process for such decisions.   This discussion came 
> from simply from the *idea* of writing such a NEP.
>
> Nothing has been decided.  Only opinions have been shared that might 
> influence the NEP.  This is all pretty premature, though ---  migration to 
> C++ features on a trial branch is some months away were it to happen.

Sure, I know we do have neps, they live in the main numpy repo (which
btw, I think they should be moved to a standalone repo to make their
management independent of the core code, but that's an easy and minor
point we can ignore for now). I was just thinking that this discussion
is precisely the kind of thing that would be well served by being
organized in a nep, before even jumping into implementation.

A nep can precisely help organize a discussion where there's enough to
think about and make decisions *before* effort has gone into
implementing anything.  It's important not to forget that once someone
goes far enough down the road of implementing something, this adds
pressure to turn the implementation into a fait accompli, simply out
of not wanting to throw work away.

For a decision as binary as 'rewrite the core in C++ or not', it would
seem to me that organizing the problem in a NEP *before* starting to
implement something in a trial branch would be precisely the way to
go, and that it would actually make the decision process and
discussion easier and more productive.

Cheers,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-28 Thread Travis Oliphant

We already use the NEP process for such decisions.   This discussion came from 
simply from the *idea* of writing such a NEP.

Nothing has been decided.  Only opinions have been shared that might influence 
the NEP.  This is all pretty premature, though ---  migration to C++ features 
on a trial branch is some months away were it to happen.

Travis 

--
Travis Oliphant
(on a mobile)
512-826-7480


On Feb 28, 2012, at 9:51 PM, Fernando Perez  wrote:

> On Tue, Feb 28, 2012 at 4:49 PM, Bryan Van de Ven  wrote:
>> Just my own $0.02 regarding this issue: I am in favor of using C++ for
>> numpy, I think it could confer various benefits. However, I am also in
>> favor of explicitly deciding and documenting what subset of C++ features
>> are acceptable for use within the numpy codebase.
> 
> I would *love* to see us adopt the NEP/PEP process for decisions as
> complex as this one.  The PEP process serves the Python community very
> well, and I think it's an excellent balance of minimal overhead and
> maximum benefit for organizing the process of making
> complex/controversial decisions.  PEP/NEPs serve a number of important
> purposes:
> 
> - they encourage the proponent of the idea to organize the initial
> presentation in a concrete, easy to follow way that can be used for
> decision making.
> 
> - they serve as a stable reference of the key points in a discussion,
> in contrast to the meandering that is normal of a mailing list thread.
> 
> - they can be updated and evolve as the discussion happens,
> incorporating the distilled ideas that result.
> 
> - if important new points are brought up in the discussion, the
> community can ensure that they are added to the NEP.
> 
> - once a decision is reached, the NEP is updated with the rationale
> for the decision.  Whether it's acceptance or rejection, this ensures
> that in the future, others can come back to this document to see the
> reasons, avoiding repetitive discussions.
> 
> - the NEP can serve as documentation for a specific feature; we see
> this often in Python, where the standard docs refer to PEPs for
> details.
> 
> - over time, these documents build a history of the key decisions in
> the design of a project, in a way that is much easier to read and
> reason about than a random splatter of long mailing list threads.
> 
> 
> I was offline when the long discussions on process happened a few
> weeks ago, and it's not my intent to dig into every point brought up
> there.  I'm only proposing that we adopt the NEP process for complex
> decisions, of which the C++ shift is certainly one.
> 
> In the end, I think the NEP process will actually *help* the
> discussion process.  It helps keep the key points on focus even as the
> discussion may drift in the mailing list, which means ultimately
> everyone wastes less energy.
> 
> I obviously can't force anyone to do this, but for what it's worth, I
> know that at least for IPython, I've had this in mind for a while.  We
> haven't had any majorly contentious decisions that really need it yet,
> but for example I have in mind a redesign and extension of the magic
> system that I intend to write-up pep-style.  While I suspect nobody
> would yell if I just went ahead and implemented it on a pull request,
> there are enough moving parts and new ideas that I want to gather
> feedback in an organized manner before proceeding with implementation.
> And I don't find that idea to be a burden, I actually do think it
> will make the whole thing go more smoothly even for me.
> 
> Just a thought...
> 
> f
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-28 Thread Fernando Perez

On Tue, Feb 28, 2012 at 4:49 PM, Bryan Van de Ven  wrote:
> Just my own $0.02 regarding this issue: I am in favor of using C++ for
> numpy, I think it could confer various benefits. However, I am also in
> favor of explicitly deciding and documenting what subset of C++ features
> are acceptable for use within the numpy codebase.

I would *love* to see us adopt the NEP/PEP process for decisions as
complex as this one.  The PEP process serves the Python community very
well, and I think it's an excellent balance of minimal overhead and
maximum benefit for organizing the process of making
complex/controversial decisions.  PEP/NEPs serve a number of important
purposes:

- they encourage the proponent of the idea to organize the initial
presentation in a concrete, easy to follow way that can be used for
decision making.

- they serve as a stable reference of the key points in a discussion,
in contrast to the meandering that is normal of a mailing list thread.

- they can be updated and evolve as the discussion happens,
incorporating the distilled ideas that result.

- if important new points are brought up in the discussion, the
community can ensure that they are added to the NEP.

- once a decision is reached, the NEP is updated with the rationale
for the decision.  Whether it's acceptance or rejection, this ensures
that in the future, others can come back to this document to see the
reasons, avoiding repetitive discussions.

- the NEP can serve as documentation for a specific feature; we see
this often in Python, where the standard docs refer to PEPs for
details.

- over time, these documents build a history of the key decisions in
the design of a project, in a way that is much easier to read and
reason about than a random splatter of long mailing list threads.

I was offline when the long discussions on process happened a few
weeks ago, and it's not my intent to dig into every point brought up
there.  I'm only proposing that we adopt the NEP process for complex
decisions, of which the C++ shift is certainly one.

In the end, I think the NEP process will actually *help* the
discussion process.  It helps keep the key points on focus even as the
discussion may drift in the mailing list, which means ultimately
everyone wastes less energy.

I obviously can't force anyone to do this, but for what it's worth, I
know that at least for IPython, I've had this in mind for a while.  We
haven't had any majorly contentious decisions that really need it yet,
but for example I have in mind a redesign and extension of the magic
system that I intend to write-up pep-style.  While I suspect nobody
would yell if I just went ahead and implemented it on a pull request,
there are enough moving parts and new ideas that I want to gather
feedback in an organized manner before proceeding with implementation.
 And I don't find that idea to be a burden, I actually do think it
will make the whole thing go more smoothly even for me.

Just a thought...

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-28 Thread Bryan Van de Ven

On 2/28/12 4:09 PM, Russell E. Owen wrote:
> I can't imagine working in C anymore and doing without exception 
> handling and namespaces. So I'm sorry to hear that C++ is not being 
> considered for a numpy rewrite. -- Russell
AFAIK C++ is still being considered for numpy in the future, and I think 
it is safe to say that a concrete implementation will be put forward for 
consideration at some point.

Just my own $0.02 regarding this issue: I am in favor of using C++ for 
numpy, I think it could confer various benefits. However, I am also in 
favor of explicitly deciding and documenting what subset of C++ features 
are acceptable for use within the numpy codebase.

Bryan Van de Ven

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-28 Thread Russell E. Owen

In article 
,
 David Cournapeau  wrote:

> On Sat, Feb 18, 2012 at 10:50 PM, Sturla Molden  wrote:
> 
> > Â > In an ideal world, we would have a better language than C++ that can
> > be spit out as > C for portability.
> >
> > What about a statically typed Python? (That is, not Cython.) We just
> > need to make the compiler :-)
> 
> There are better languages than C++ that has most of the technical
> benefits stated in this discussion (rust and D being the most
> "obvious" ones), but whose usage is unrealistic today for various
> reasons: knowledge, availability on "esoteric" platforms, etcâ¦ A new
> language is completely ridiculous.

I just want to say that C++ has come a long way. I used to hate it, but 
now that it has matured, and using some basic features of boost 
(especially shared_ptr) can turn it into a really nice language. The 
next version will be even better, but one can write nice C++ today.

shared_ptr allows objects that easily manage their own memory (basic 
automatic garbage collection).

Generic programming seems like a really good fit to numpy's array types.

I am part of a large project that codes in C++ and Python and we find it 
works very well for us.

I can't imagine working in C anymore and doing without exception 
handling and namespaces. So I'm sorry to hear that C++ is not being 
considered for a numpy rewrite.

-- Russell

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-28 Thread Charles R Harris

On Tue, Feb 28, 2012 at 2:34 PM, Dag Sverre Seljebotn <
d.s.seljeb...@astro.uio.no> wrote:

> On 02/28/2012 11:05 AM, John Hunter wrote:
> > On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau  > > wrote:
> >
> >
> > There are better languages than C++ that has most of the technical
> > benefits stated in this discussion (rust and D being the most
> > "obvious" ones), but whose usage is unrealistic today for various
> > reasons: knowledge, availability on "esoteric" platforms, etc… A new
> > language is completely ridiculous.
> >
> >
> >
> > I just saw this for the first time today: Linus Torvalds on C++
> > (http://harmful.cat-v.org/software/c++/linus).  The post is from 2007 so
> > many of you may have seen it, but I thought it was entertainng enough
> > and on-topic enough with this thread that I'd share it in case you
> haven't.
> >
> >
> > The point he makes:
> >
> >In other words, the only way to do good, efficient, and system-level
> and
> >portable C++ ends up to limit yourself to all the things that
> > are basically
> >available in C
> >
> > was interesting to me because the best C++ library I have ever worked
> > with (agg) imports *nothing* except standard C libs (no standard
> > template library).  In fact, the only includes external to external to
> > itself are math.h, stdlib.h, stdio.h, and string.h.
> >
> > To shoehorn Jamie Zawinski's famous regex quote
> > (http://regex.info/blog/2006-09-15/247). "Some people, when confronted
> > with a problem, think “I know, I'll use boost.”   Now they have two
> > problems."
>
>
> In the same vein, this one neatly sums up all the bad sides of C++.
>
> (I don't really want to enter the language discussion. But this list is
> a nice list of the cons, and perhaps that can save discussion time
> because people don't have to enumerate those reasons again on this list?)
>
> http://yosefk.com/c++fqa/defective.html
>
>
Heh, I was hoping for something good, but that was kinda unfair. OK, so C++
isn't JAVA or C# or Python, no garbage collection or introspection or
whatever, but so what. Destructors are called as the exception unwinds up
the call stack, etc. That list is sort of the opposite end of the critical
spectrum from Linus (C++ does too much) and is more like a complaint that
C++ doesn't walk the dog. Can't satisfy everyone ;)



Chuck.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-28 Thread Dag Sverre Seljebotn

On 02/28/2012 11:05 AM, John Hunter wrote:
> On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau  > wrote:
>
>
> There are better languages than C++ that has most of the technical
> benefits stated in this discussion (rust and D being the most
> "obvious" ones), but whose usage is unrealistic today for various
> reasons: knowledge, availability on "esoteric" platforms, etc… A new
> language is completely ridiculous.
>
>
>
> I just saw this for the first time today: Linus Torvalds on C++
> (http://harmful.cat-v.org/software/c++/linus).  The post is from 2007 so
> many of you may have seen it, but I thought it was entertainng enough
> and on-topic enough with this thread that I'd share it in case you haven't.
>
>
> The point he makes:
>
>In other words, the only way to do good, efficient, and system-level and
>portable C++ ends up to limit yourself to all the things that
> are basically
>available in C
>
> was interesting to me because the best C++ library I have ever worked
> with (agg) imports *nothing* except standard C libs (no standard
> template library).  In fact, the only includes external to external to
> itself are math.h, stdlib.h, stdio.h, and string.h.
>
> To shoehorn Jamie Zawinski's famous regex quote
> (http://regex.info/blog/2006-09-15/247). "Some people, when confronted
> with a problem, think “I know, I'll use boost.”   Now they have two
> problems."


In the same vein, this one neatly sums up all the bad sides of C++.

(I don't really want to enter the language discussion. But this list is 
a nice list of the cons, and perhaps that can save discussion time 
because people don't have to enumerate those reasons again on this list?)

http://yosefk.com/c++fqa/defective.html

Dag


>
> Here is the Linus post:
>
> From: Linus Torvalds  linux-foundation.org
> >
> Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String
> Library.
> Newsgroups: gmane.comp.version-control.git
> Date: 2007-09-06 17:50:28 GMT (2 years, 14 weeks, 16 hours and 36
> minutes ago)
>
> On Wed, 5 Sep 2007, Dmitry Kakurin wrote:
>  >
>  > When I first looked at Git source code two things struck me as odd:
>  > 1. Pure C as opposed to C++. No idea why. Please don't talk about
> portability,
>  > it's BS.
>
> *YOU* are full of bullshit.
>
> C++ is a horrible language. It's made more horrible by the fact that a lot
> of substandard programmers use it, to the point where it's much much
> easier to generate total and utter crap with it. Quite frankly, even if
> the choice of C were to do *nothing* but keep the C++ programmers out,
> that in itself would be a huge reason to use C.
>
> In other words: the choice of C is the only sane choice. I know Miles
> Bader jokingly said "to piss you off", but it's actually true. I've come
> to the conclusion that any programmer that would prefer the project to be
> in C++ over C is likely a programmer that I really *would* prefer to piss
> off, so that he doesn't come and screw up any project I'm involved with.
>
> C++ leads to really really bad design choices. You invariably start using
> the "nice" library features of the language like STL and Boost and other
> total and utter crap, that may "help" you program, but causes:
>
>   - infinite amounts of pain when they don't work (and anybody who tells me
> that STL and especially Boost are stable and portable is just so full
> of BS that it's not even funny)
>
>   - inefficient abstracted programming models where two years down the road
> you notice that some abstraction wasn't very efficient, but now all
> your code depends on all the nice object models around it, and you
> cannot fix it without rewriting your app.
>
> In other words, the only way to do good, efficient, and system-level and
> portable C++ ends up to limit yourself to all the things that are
> basically available in C. And limiting your project to C means that people
> don't screw that up, and also means that you get a lot of programmers that
> do actually understand low-level issues and don't screw things up with any
> idiotic "object model" crap.
>
> So I'm sorry, but for something like git, where efficiency was a primary
> objective, the "advantages" of C++ is just a huge mistake. The fact that
> we also piss off people who cannot see that is just a big additional
> advantage.
>
> If you want a VCS that is written in C++, go play with Monotone. Really.
> They use a "real database". They use "nice object-oriented libraries".
> They use "nice C++ abstractions". And quite frankly, as a result of all
> these design decisions that sound so appealing to some CS people, the end
> result is a horrible and unmaintainable mess.
>
> But I'm sure you'd like it more than git.
>
>  Linus
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discuss

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-28 Thread Charles R Harris

On Tue, Feb 28, 2012 at 12:05 PM, John Hunter  wrote:

> On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau wrote:
>
>>
>> There are better languages than C++ that has most of the technical
>>
>> benefits stated in this discussion (rust and D being the most
>> "obvious" ones), but whose usage is unrealistic today for various
>> reasons: knowledge, availability on "esoteric" platforms, etc… A new
>> language is completely ridiculous.
>>
>
>
> I just saw this for the first time today: Linus Torvalds on C++ (
> http://harmful.cat-v.org/software/c++/linus).  The post is from 2007 so
> many of you may have seen it, but I thought it was entertainng enough and
> on-topic enough with this thread that I'd share it in case you haven't.
>
>
> The point he makes:
>
>   In other words, the only way to do good, efficient, and system-level and
>   portable C++ ends up to limit yourself to all the things that
> are basically
>   available in C
>
> was interesting to me because the best C++ library I have ever worked with
> (agg) imports *nothing* except standard C libs (no standard template
> library).  In fact, the only includes external to external to itself
> are math.h, stdlib.h, stdio.h, and string.h.
>
> To shoehorn Jamie Zawinski's famous regex quote (
> http://regex.info/blog/2006-09-15/247).  "Some people, when confronted
> with a problem, think “I know, I'll use boost.”   Now they have two
> problems."
>
> Here is the Linus post:
>
> From: Linus Torvalds  linux-foundation.org>
> Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String
> Library.
> Newsgroups: gmane.comp.version-control.git
> Date: 2007-09-06 17:50:28 GMT (2 years, 14 weeks, 16 hours and 36 minutes
> ago)
>
> On Wed, 5 Sep 2007, Dmitry Kakurin wrote:
> >
> > When I first looked at Git source code two things struck me as odd:
> > 1. Pure C as opposed to C++. No idea why. Please don't talk about
> portability,
> > it's BS.
>
> *YOU* are full of bullshit.
>
> C++ is a horrible language. It's made more horrible by the fact that a lot
> of substandard programmers use it, to the point where it's much much
> easier to generate total and utter crap with it. Quite frankly, even if
> the choice of C were to do *nothing* but keep the C++ programmers out,
> that in itself would be a huge reason to use C.
>
> In other words: the choice of C is the only sane choice. I know Miles
> Bader jokingly said "to piss you off", but it's actually true. I've come
> to the conclusion that any programmer that would prefer the project to be
> in C++ over C is likely a programmer that I really *would* prefer to piss
> off, so that he doesn't come and screw up any project I'm involved with.
>
> C++ leads to really really bad design choices. You invariably start using
> the "nice" library features of the language like STL and Boost and other
> total and utter crap, that may "help" you program, but causes:
>
>  - infinite amounts of pain when they don't work (and anybody who tells me
>that STL and especially Boost are stable and portable is just so full
>of BS that it's not even funny)
>
>  - inefficient abstracted programming models where two years down the road
>you notice that some abstraction wasn't very efficient, but now all
>your code depends on all the nice object models around it, and you
>cannot fix it without rewriting your app.
>
> In other words, the only way to do good, efficient, and system-level and
> portable C++ ends up to limit yourself to all the things that are
> basically available in C. And limiting your project to C means that people
> don't screw that up, and also means that you get a lot of programmers that
> do actually understand low-level issues and don't screw things up with any
> idiotic "object model" crap.
>
> So I'm sorry, but for something like git, where efficiency was a primary
> objective, the "advantages" of C++ is just a huge mistake. The fact that
> we also piss off people who cannot see that is just a big additional
> advantage.
>
> If you want a VCS that is written in C++, go play with Monotone. Really.
> They use a "real database". They use "nice object-oriented libraries".
> They use "nice C++ abstractions". And quite frankly, as a result of all
> these design decisions that sound so appealing to some CS people, the end
> result is a horrible and unmaintainable mess.
>
> But I'm sure you'd like it more than git.
>
>
Yeah, Linus doesn't like C++. No doubt that is in part because of the
attempt to rewrite Linux in C++ back in the early 90's and the resulting
compiler and portability problems. Linus also writes C like it was his
native tongue, he likes to work close to the metal, and he'd probably
prefer it over Python for most problems ;) Things have improved in the
compiler department, and I think C++ really wasn't much of an improvement
over C until templates and the STL came along. The boost smart pointers are
also really nice. OTOH, it is really easy to write awful C++ because of the
way inheritance

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-28 Thread John Hunter

On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau wrote:

>
> There are better languages than C++ that has most of the technical
> benefits stated in this discussion (rust and D being the most
> "obvious" ones), but whose usage is unrealistic today for various
> reasons: knowledge, availability on "esoteric" platforms, etc… A new
> language is completely ridiculous.

I just saw this for the first time today: Linus Torvalds on C++ (
http://harmful.cat-v.org/software/c++/linus).  The post is from 2007 so
many of you may have seen it, but I thought it was entertainng enough and
on-topic enough with this thread that I'd share it in case you haven't.

The point he makes:

  In other words, the only way to do good, efficient, and system-level and
  portable C++ ends up to limit yourself to all the things that
are basically
  available in C

was interesting to me because the best C++ library I have ever worked with
(agg) imports *nothing* except standard C libs (no standard template
library).  In fact, the only includes external to external to itself
are math.h, stdlib.h, stdio.h, and string.h.

To shoehorn Jamie Zawinski's famous regex quote (
http://regex.info/blog/2006-09-15/247).  "Some people, when confronted with
a problem, think “I know, I'll use boost.”   Now they have two problems."

Here is the Linus post:

From: Linus Torvalds  linux-foundation.org>
Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String
Library.
Newsgroups: gmane.comp.version-control.git
Date: 2007-09-06 17:50:28 GMT (2 years, 14 weeks, 16 hours and 36 minutes
ago)

On Wed, 5 Sep 2007, Dmitry Kakurin wrote:
>
> When I first looked at Git source code two things struck me as odd:
> 1. Pure C as opposed to C++. No idea why. Please don't talk about
portability,
> it's BS.

*YOU* are full of bullshit.

C++ is a horrible language. It's made more horrible by the fact that a lot
of substandard programmers use it, to the point where it's much much
easier to generate total and utter crap with it. Quite frankly, even if
the choice of C were to do *nothing* but keep the C++ programmers out,
that in itself would be a huge reason to use C.

In other words: the choice of C is the only sane choice. I know Miles
Bader jokingly said "to piss you off", but it's actually true. I've come
to the conclusion that any programmer that would prefer the project to be
in C++ over C is likely a programmer that I really *would* prefer to piss
off, so that he doesn't come and screw up any project I'm involved with.

C++ leads to really really bad design choices. You invariably start using
the "nice" library features of the language like STL and Boost and other
total and utter crap, that may "help" you program, but causes:

 - infinite amounts of pain when they don't work (and anybody who tells me
   that STL and especially Boost are stable and portable is just so full
   of BS that it's not even funny)

 - inefficient abstracted programming models where two years down the road
   you notice that some abstraction wasn't very efficient, but now all
   your code depends on all the nice object models around it, and you
   cannot fix it without rewriting your app.

In other words, the only way to do good, efficient, and system-level and
portable C++ ends up to limit yourself to all the things that are
basically available in C. And limiting your project to C means that people
don't screw that up, and also means that you get a lot of programmers that
do actually understand low-level issues and don't screw things up with any
idiotic "object model" crap.

So I'm sorry, but for something like git, where efficiency was a primary
objective, the "advantages" of C++ is just a huge mistake. The fact that
we also piss off people who cannot see that is just a big additional
advantage.

If you want a VCS that is written in C++, go play with Monotone. Really.
They use a "real database". They use "nice object-oriented libraries".
They use "nice C++ abstractions". And quite frankly, as a result of all
these design decisions that sound so appealing to some CS people, the end
result is a horrible and unmaintainable mess.

But I'm sure you'd like it more than git.

Linus
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-27 Thread Jason McCampbell

>
> Sure.  This list actually deserves a long writeup about that.   First,
> there wasn't a "Cython-refactor" of NumPy.   There was a Cython-refactor of
> SciPy.   I'm not sure of it's current status.   I'm still very supportive
> of that sort of thing.
>
>
> I think I missed that - is it on git somewhere?
>
>
> I thought so, but I can't find it either.  We should ask Jason McCampbell
> of Enthought where the code is located.   Here are the distributed eggs:
> http://www.enthought.com/repo/.iron/
>
> -Travis
>

Hi Travis and everyone, just cleaning up email and saw this question.  The
trees had been in my personal GitHub account prior to Enthought switching
over.  I forked them now and the paths are:
https://github.com/enthought/numpy-refactor
https://github.com/enthought/scipy-refactor

The numpy code is on the 'refactor' branch.  The master branch is dated but
consistent (correct commit IDs) with the master NumPy repository on GitHub
so the refactor branch should be able to be pushed to the main numpy
account if desired.

The scipy code was cloned from the subversion repository and so would
either need to be moved back to svn or sync'd with any git migration.

Jason
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-22 Thread Charles R Harris

Hi Perry,

On Wed, Feb 22, 2012 at 6:44 AM, Perry Greenfield  wrote:

> I, like Travis, have my worries about C++. But if those actually doing
> the work (and particularly the subsequent support) feel it is the best
> language for implementation, I can live with that.
>
> I particularly like the incremental and conservative approach to
> introducing C++ that was proposed by Mark. What I would like to stress
> in doing this that all along that process, extensive testing is
> performed (preferably with some build-bot process) to ensure that
> whatever C++ features are being introduced are fully portable and
> don't present intractable distribution issues. Whatever we do, we
> don't want to go far down that road only to find out that there is no
> good solution in that regard with certain platforms.
>
> We are particularly sensitive to this issue since we distribute our
> software, and anything that makes installation of numpy problematic is
> a very serious issue for us. It has to be an easy install on all
> common platforms. That is one thing C allowed, despite all its flaws,
> which is near universal installation advantages over any other
> language available. If the appropriate subset of C++ can achieve that,
> great. But it has to be proved continuously as it is incrementally
> adopted. (I'm not much persuaded by comments like "my experience has
> shown it not to be a problem")
>
> Is there any disagreement with this?
>
> It's less clear to me what to do about more unusual platforms. It
> seems to me that some sort of testing against those that may prove
> important in the future (e.g., gpus?) will be needed, but how to do
> this is not clear to me.
>
>
Your group has been one of the best for testing numpy. What systems do you
support at this time?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-22 Thread Perry Greenfield

I, like Travis, have my worries about C++. But if those actually doing  
the work (and particularly the subsequent support) feel it is the best  
language for implementation, I can live with that.

I particularly like the incremental and conservative approach to  
introducing C++ that was proposed by Mark. What I would like to stress  
in doing this that all along that process, extensive testing is  
performed (preferably with some build-bot process) to ensure that  
whatever C++ features are being introduced are fully portable and  
don't present intractable distribution issues. Whatever we do, we  
don't want to go far down that road only to find out that there is no  
good solution in that regard with certain platforms.

We are particularly sensitive to this issue since we distribute our  
software, and anything that makes installation of numpy problematic is  
a very serious issue for us. It has to be an easy install on all  
common platforms. That is one thing C allowed, despite all its flaws,  
which is near universal installation advantages over any other  
language available. If the appropriate subset of C++ can achieve that,  
great. But it has to be proved continuously as it is incrementally  
adopted. (I'm not much persuaded by comments like "my experience has  
shown it not to be a problem")

Is there any disagreement with this?

It's less clear to me what to do about more unusual platforms. It  
seems to me that some sort of testing against those that may prove  
important in the future (e.g., gpus?) will be needed, but how to do  
this is not clear to me.

Perry
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-22 Thread Neal Becker

It's great advice to say 

avoid using new

instead rely on scope and classes such as std::vector.

I just want to point out, that sometimes objects must outlive scope.

For those cases, std::shared_ptr can be helpful.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-21 Thread Gael Varoquaux

On Sun, Feb 19, 2012 at 05:44:27AM -0500, David Warde-Farley wrote:
> I think the comments about the developer audience NumPy will attract are 
> important. There may be lots of C++ developers out there, but the 
> intersection of (truly competent in C++) and (likely to involve oneself in 
> NumPy development) may well be quite small.

That's a very valid concern. It is reminiscent of a possible cause to our
lack of contributors to Mayavi: contributing to Mayavi requires knowing
VTK. One of the major benefits of Mayavi is that it makes it is to use
the power of VTK without understanding it well. The intersection of the
people interested in using Mayavi and able to contribute to it is almost
empty.

This is stricking to me, because I know a lot of who know VTK well. Most
of them couldn't care less for Mayavi: they are happy coding directly in
VTK in C++. This is also a reason why I don't code UIs any more: I simply
cannot find the resource to maintain them in proportion with the number
of users that they garner. A sad statement.

Gael
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-21 Thread Nathaniel Smith

On Tue, Feb 21, 2012 at 4:04 AM, Travis Oliphant  wrote:
> It uses llvm-py (modified to work with LLVM 3.0) and code I wrote to do the
> translation from Python byte-code to LLVM.   This LLVM can then be "JIT"ed.
>   I have several applications that I would like to use this for.   It would
> be possible to write "more of NumPy" using this approach.     Initially, it
> makes it *very* easy to create a machine-code ufunc from Python code.
> There are other use-cases of having loops written in Python and plugged in
> to a calculation, filtering, or indexing framework that this system will be
> useful for.

Very neat!

It's interesting that you decided to use Python bytecode as your
source representation. I'm curious what your strategy is for
overcoming all the challenges that have plagued previous attempts to
efficiently compile "real Python"? (Unladen Swallow, PyPy, etc.) Just
support some subset of the language that's easy to handle and do type
inference over? Or do you plan to continue using Python as your input
language?

I guess the conventional wisdom would be that there's a lot of
potential for using LLVM to generate efficient specialized loops for
numpy on the fly (cf. llvm-pipe for a similar and successful project),
but that the key would be to use a more specialized representation
than Python bytecode -- one that left out hard/irrelevant parts of the
language, that had richer type information, that didn't change around
for different Python releases, etc.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Travis Oliphant

Interesting you bring this up.   I actually have a working prototype of using 
Python to emit LLVM.   I will be showing it at the HPC tutorial that I am 
giving at PyCon.I will be making this available after PyCon to a wider 
audience as open source.  

It uses llvm-py (modified to work with LLVM 3.0) and code I wrote to do the 
translation from Python byte-code to LLVM.   This LLVM can then be "JIT"ed.   I 
have several applications that I would like to use this for.   It would be 
possible to write "more of NumPy" using this approach. Initially, it makes 
it *very* easy to create a machine-code ufunc from Python code.   There are 
other use-cases of having loops written in Python and plugged in to a 
calculation, filtering, or indexing framework that this system will be useful 
for.  

There is still a need for a core data-type object, a core array object, and a 
core calculation object.   Maybe some-day these cores can be shrunk to a 
smaller subset and more of something along the lines of LLVM generation from 
Python can be used.   But, there is a lot of work to do before that is 
possible.But, a lot of the currently pre-compiled loops can be done on the 
fly instead using this approach.There are several things I'm working on in 
that direction. 

This is not PyPy.   It certainly uses the same ideas that they are using, but 
instead it fits into the CPython run-time and doesn't require changing the 
whole ecosystem. If you are interested in this work let me know.  I think 
I'm going to call the project numpy-llvm, or fast-py, or something like that.   
It is available on github and will be open source (but it's still under active 
development). 

Here is an example of the code to create a ufunc using the system (this is like 
vectorize, but it creates machine code and by-passes the interpreter and so is 
100x faster).  

from math import sin, pi

def sinc(x):
if x==0:
return 1.0
else:
return sin(x*pi)/(pi*x)

from translate import Translate
t = Translate(sinc)
t.translate()
print t.mod

res = t.make_ufunc('sinc')

-Travis

On Feb 20, 2012, at 10:55 AM, Sturla Molden wrote:

> Den 20.02.2012 17:42, skrev Sturla Molden:
>> There are still other options than C or C++ that are worth considering.
>> One would be to write NumPy in Python. E.g. we could use LLVM as a
>> JIT-compiler and produce the performance critical code we need on the fly.
>> 
>> 
> 
> LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster 
> than GCC and often produces better machine code. They can therefore be 
> used inside an array library. It would give a faster NumPy, and we could 
> keep most of it in Python.
> 
> Sturla
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Robert Kern

On Mon, Feb 20, 2012 at 19:55, Paul Anton Letnes
 wrote:
>
> On 20. feb. 2012, at 16:29, Sturla Molden wrote:

>>> - in newer standards it has some nontrivial mathematical functions: gamma, 
>>> bessel, etc. that numpy lacks right now
>>
>> That belongs to SciPy.
>
> I don't see exactly why. Why should numpy have exponential but not gamma 
> functions? The division seems kinda arbitrary. Not that I am arguing 
> violently for bessel functions in numpy.

The semi-arbitrary dividing line that we have settled on is C99. If a
special function is in the C99 standard, we'll accept an
implementation for it in numpy. Part (well, most) of the rationale is
just to have a clear dividing line even if it's fairly arbitrary. The
other part is that if a decidedly non-mathematically-focused standard
like C99 includes a special function in its standard library, then
odds are good that it's something that is widely used enough as a
building block for other things.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Sturla Molden

Den 20.02.2012 21:12, skrev Sturla Molden:
>
> If you need to control the lifetime of an object, make an inner block
> with curly brackets, and declare it on top of the block. Don't call new
> and delete to control where you want it to be allocated and deallocated.
> Nothing goes on the heap unless STL puts it there.
>

Here is an example:


// bad

Foo *bar = new Foo();

delete Foo;


// ok
{
 Foo bar();

}

Remember C++ does not allow a "finally" clause to exception handling. 
You cannot do this:

try {
 Foo *bar = new Foo();
} finally { // syntax error
 delete Foo;
}

So...

try {
 Foo *bar = new Foo();
} catch(...) {

}
// might not get here, possible
// resource leak
delete Foo;


Which is why we should always do this:

{
 Foo bar();

}

This is perhaps the most common source of errors in C++ code. If we use 
C++ in the NumPy core, we need a Nazi regime against these type of 
obscure errors.


Sturla









___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Sturla Molden

Den 20.02.2012 20:14, skrev Daniele Nicolodi:

> Hello Sturla, unrelated to the numpy tewrite debate, can you please 
> suggest some resources you think can be used to learn how to program 
> C++ "the proper way"? Thank you. Cheers, 

This is totally OT on this list, however ...

Scott Meyer's books have been mentioned. Also look at some litterature 
on the STL (e.g. Josuittis). Getting the Boost library is essential as 
well. The Qt library have many examples of beautiful C++.

But the most important part, in my opinion, is to put the "C with 
classes" mentality away. Look at it as compiled Python or Java. The STL 
(the standard C++ library) has classes that do the same as the types we 
use in Python  --- there are parallels to tuple, dict, set, list, deque, 
etc. The STL is actually richer than Python. Just use them the way we 
use Python. With C++11 (the latest standard), even for loops can be like 
Python. There are lamdas and closures, to be used as in Python, and 
there is an 'auto' keyword for type inference; you don't have to declare 
the type of a variable, the compiler will figure it out.

Don't use new[] just because you can, when there is std::vector that 
behaves lika Python list.

If you need to allocate a resource, wrap it in a class. Allocate from 
the contructor and deallocate from the destructor. That way an exception 
cannot cause a resource leak, and the clean-up code will be called 
automatically when the object fall of the stack.

If you need to control the lifetime of an object, make an inner block 
with curly brackets, and declare it on top of the block. Don't call new 
and delete to control where you want it to be allocated and deallocated. 
Nothing goes on the heap unless STL puts it there.

Always put objects on the stack, never allocate to a pointer with new.  
Always use references, and forget about pointers. This has to do with 
putting the "C with classes" mentality away. Always implement a copy 
constructor so the classes work with the STL.

std:: vector  x(n); // ok
void foobar(std:: vector&  x); // ok

double* x = new double [n]; // bad
std:: vector *x = new std:: vector (n); // bad
void foobar(std:: vector*  x); // bad

If you get any textbook on Windows programming from Microsoft Press, you 
have an excellent resource on what not to do. Verbose functions and 
field names, Hungarian notation, factories instead of constructors, 
etc.  If you find yourself using macros or template magic to avoid the 
overhead of a virtual function (MFC, ATL, wxWidgets, FOX), for the 
expense of readability, you are probably doing something you shouldn't. 
COM is probably the worst example I know of, just compare the beautiful 
OpenGL to Direct3D. VTK is another example of what I consider ugly C++.

But that's just my opinion.


Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread James Bergstra

Looks like Dag forked the discussion of lazy evaluation to a new thread
 ([Numpy-discussion] ndarray and lazy evaluation).

There are actually several projects inspired by this sort of design: off
the top of my head I can think of Theano, copperhead, numexpr, arguably
sympy, and some non-public code by Nicolas Pinto. So I think the strengths
of the approach in principle are established... the big question is how to
make this approach easy to use in all the settings where it could be
useful. I don't think any of these projects has gotten that totally right.

-JB

On Mon, Feb 20, 2012 at 2:41 PM, Lluís  wrote:

> Lluís  writes:
>
> > Francesc Alted writes:
> >> On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote:
> >>> You need at least a slightly different Python API to get anywhere, so
> >>> numexpr/Theano is the right place to work on an implementation of this
> >>> idea. Of course it would be nice if numexpr/Theano offered something as
> >>> convenient as
> >>>
> >>> with lazy:
> >>> arr = A + B + C # with all of these NumPy arrays
> >>> # compute upon exiting…
>
> >> Hmm, that would be cute indeed.  Do you have an idea on how the code in
> the with
> >> context could be passed to the Python AST compiler (à la
> numexpr.evaluate("A + B
> >> + C"))?
>
> > Well, I started writing some experiments to "almost transparently"
> translate
> > regular ndarray operations to numexpr strings (or others) using only
> python
> > code.
> [...]
> > My target was to use this to also generate optimized GPU kernels
> in-flight using
> > pycuda, but I think some other relatively recent project already
> performed
> > something similar (w.r.t. generating cuda kernels out of python
> expressions).
>
> Aaahhh, I just had a quick look at Theano and it seems it's the project I
> was
> referring to.
>
> Good job! :)
>
>
> Lluis
>
> --
>  "And it's much the same thing with knowledge, for whenever you learn
>  something new, the whole world becomes that much richer."
>  -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
>  Tollbooth
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
http://www-etud.iro.umontreal.ca/~bergstrj
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Paul Anton Letnes


On 20. feb. 2012, at 16:29, Sturla Molden wrote:

> Den 20.02.2012 08:35, skrev Paul Anton Letnes:
>> In the language wars, I have one question. Why is Fortran not being 
>> considered? Fortran already implements many of the features that we want in 
>> NumPy:
> 
> Yes ... but it does not make Fortran a systems programming language. 
> Making NumPy is different from using it.
> 
>> - slicing and similar operations, at least some of the fancy indexing kind
>> - element-wise array operations and function calls
>> - array bounds-checking and other debugging aid (with debugging flags)
> 
> That is nice for numerical computing, but not really needed to make NumPy.
> 
> 
>> - arrays that mentally map very well onto numpy arrays. To me, this spells 
>> +1 to ease of contribution, over some abstract C/C++ template
> 
> Mentally perhaps, but not binary. NumPy needs uniformly strided memory 
> on the binary level. Fortran just gives this at the mental level. E.g. 
> there is nothing that dictates a Fortran pointer has to be a view, the 
> compiler is free to employ copy-in copy-out. In Fortran, a function call 
> can invalidate a pointer.  One would therefore have to store the array 
> in an array of integer*1, and use the intrinsic function transfer() to 
> parse the contents into NumPy dtypes.
> 
>> - in newer standards it has some nontrivial mathematical functions: gamma, 
>> bessel, etc. that numpy lacks right now
> 
> That belongs to SciPy.

I don't see exactly why. Why should numpy have exponential but not gamma 
functions? The division seems kinda arbitrary. Not that I am arguing violently 
for bessel functions in numpy.

>> - compilers that are good at optimizing for floating-point performance, 
>> because that's what Fortran is all about
> 
> Insanely good, but not when we start to do the (binary, not mentally) 
> strided access that NumPy needs. (Not that C compilers would be any better.)
> 
> 
> 
>> - not Fortran as such, but BLAS and LAPACK are easily accessed by Fortran
>> - possibly other numerical libraries that can be helpful
>> - Fortran has, in its newer standards, thought of C interoperability. We 
>> could still keep bits of the code in C (or even C++?) if we'd like to, or 
>> perhaps f2py/Cython could do the wrapping.
> 
> Not f2py, as it depends on NumPy.
> 
> - some programmers know Fortran better than C++. Fortran is at least used by 
> many science guys, like me.
> 
> 
> That is a valid arguments. Fortran is also much easier to read and debug.
> 
> 
> Sturla

Thanks for an excellent answer, Sturla - very informative indeed.

Paul.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Lluís

Lluís  writes:

> Francesc Alted writes:
>> On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote:
>>> You need at least a slightly different Python API to get anywhere, so 
>>> numexpr/Theano is the right place to work on an implementation of this 
>>> idea. Of course it would be nice if numexpr/Theano offered something as 
>>> convenient as
>>> 
>>> with lazy:
>>> arr = A + B + C # with all of these NumPy arrays
>>> # compute upon exiting…

>> Hmm, that would be cute indeed.  Do you have an idea on how the code in the 
>> with
>> context could be passed to the Python AST compiler (à la numexpr.evaluate("A 
>> + B
>> + C"))?

> Well, I started writing some experiments to "almost transparently" translate
> regular ndarray operations to numexpr strings (or others) using only python
> code.
[...]
> My target was to use this to also generate optimized GPU kernels in-flight 
> using
> pycuda, but I think some other relatively recent project already performed
> something similar (w.r.t. generating cuda kernels out of python expressions).

Aaahhh, I just had a quick look at Theano and it seems it's the project I was
referring to.

Good job! :)


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Neal Becker

Charles R Harris wrote:

> On Fri, Feb 17, 2012 at 12:09 PM, Benjamin Root  wrote:
> 
>>
>>
>> On Fri, Feb 17, 2012 at 1:00 PM, Christopher Jordan-Squire <
>> cjord...@uw.edu> wrote:
>>
>>> On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe  wrote:
>>> > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing 
>>> wrote:
>>> >>
>>> >> On 02/17/2012 05:39 AM, Charles R Harris wrote:
>>> >> >
>>> >> >
>>> >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau <
>>> courn...@gmail.com
>>> >> > > wrote:
>>> >> >
>>> >> > Hi Travis,
>>> >> >
>>> >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant
>>> >> > mailto:tra...@continuum.io>> wrote:
>>> >> >  > Mark Wiebe and I have been discussing off and on (as well as
>>> >> > talking with Charles) a good way forward to balance two competing
>>> >> > desires:
>>> >> >  >
>>> >> >  >* addition of new features that are needed in NumPy
>>> >> >  >* improving the code-base generally and moving towards
>>> a
>>> >> > more maintainable NumPy
>>> >> >  >
>>> >> >  > I know there are load voices for just focusing on the second
>>> of
>>> >> > these and avoiding the first until we have finished that.  I
>>> >> > recognize the need to improve the code base, but I will also be
>>> >> > pushing for improvements to the feature-set and user experience
>>> in
>>> >> > the process.
>>> >> >  >
>>> >> >  > As a result, I am proposing a rough outline for releases over
>>> the
>>> >> > next year:
>>> >> >  >
>>> >> >  >* NumPy 1.7 to come out as soon as the serious bugs
>>> can be
>>> >> > eliminated.  Bryan, Francesc, Mark, and I are able to help triage
>>> >> > some of those.
>>> >> >  >
>>> >> >  >* NumPy 1.8 to come out in July which will have as many
>>> >> > ABI-compatible feature enhancements as we can add while improving
>>> >> > test coverage and code cleanup.   I will post to this list more
>>> >> > details of what we plan to address with it later.Included for
>>> >> > possible inclusion are:
>>> >> >  >* resolving the NA/missing-data issues
>>> >> >  >* finishing group-by
>>> >> >  >* incorporating the start of label arrays
>>> >> >  >* incorporating a meta-object
>>> >> >  >* a few new dtypes (variable-length string,
>>> >> > varialbe-length unicode and an enum type)
>>> >> >  >* adding ufunc support for flexible dtypes and possibly
>>> >> > structured arrays
>>> >> >  >* allowing generalized ufuncs to work on more kinds of
>>> >> > arrays besides just contiguous
>>> >> >  >* improving the ability for NumPy to receive
>>> JIT-generated
>>> >> > function pointers for ufuncs and other calculation opportunities
>>> >> >  >* adding "filters" to Input and Output
>>> >> >  >* simple computed fields for dtypes
>>> >> >  >* accepting a Data-Type specification as a class or
>>> JSON
>>> >> > file
>>> >> >  >* work towards improving the dtype-addition mechanism
>>> >> >  >* re-factoring of code so that it can compile with a
>>> C++
>>> >> > compiler and be minimally dependent on Python data-structures.
>>> >> >
>>> >> > This is a pretty exciting list of features. What is the rationale
>>> >> > for
>>> >> > code being compiled as C++ ? IMO, it will be difficult to do so
>>> >> > without preventing useful C constructs, and without removing
>>> some of
>>> >> > the existing features (like our use of C99 complex). The subset
>>> that
>>> >> > is both C and C++ compatible is quite constraining.
>>> >> >
>>> >> >
>>> >> > I'm in favor of this myself, C++ would allow a lot code cleanup and
>>> make
>>> >> > it easier to provide an extensible base, I think it would be a
>>> natural
>>> >> > fit with numpy. Of course, some C++ projects become tangled messes of
>>> >> > inheritance, but I'd be very interested in seeing what a good C++
>>> >> > designer like Mark, intimately familiar with the numpy code base,
>>> could
>>> >> > do. This opportunity might not come by again anytime soon and I
>>> think we
>>> >> > should grab onto it. The initial step would be a release whose code
>>> that
>>> >> > would compile in both C/C++, which mostly comes down to removing C++
>>> >> > keywords like 'new'.
>>> >> >
>>> >> > I did suggest running it by you for build issues, so please raise any
>>> >> > you can think of. Note that MatPlotLib is in C++, so I don't think
>>> the
>>> >> > problems are insurmountable. And choosing a set of compilers to
>>> support
>>> >> > is something that will need to be done.
>>> >>
>>> >> It's true that matplotlib relies heavily on C++, both via the Agg
>>> >> library and in its own extension code.  Personally, I don't like this;
>>> I
>>> >> think it raises the barrier to contributing.  C++ is an order of
>>> >> magnitu

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Lluís

Francesc Alted writes:

> On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote:
>> You need at least a slightly different Python API to get anywhere, so 
>> numexpr/Theano is the right place to work on an implementation of this 
>> idea. Of course it would be nice if numexpr/Theano offered something as 
>> convenient as
>> 
>> with lazy:
>> arr = A + B + C # with all of these NumPy arrays
>> # compute upon exiting…

> Hmm, that would be cute indeed.  Do you have an idea on how the code in the 
> with
> context could be passed to the Python AST compiler (à la numexpr.evaluate("A 
> + B
> + C"))?

Well, I started writing some experiments to "almost transparently" translate
regular ndarray operations to numexpr strings (or others) using only python
code.

The concept is very simple:

# you only need the first one to start building the AST
a = lazy(np.arange(16))
b = np.arange(16)
res = a + b + 3
print evaluate(res)
# the actual evaluation can be delayed to something like __repr__ or __str__
print repr(res)
print res
# you could also delay evaluation until someone uses res to create a new 
array

My target was to use this to also generate optimized GPU kernels in-flight using
pycuda, but I think some other relatively recent project already performed
something similar (w.r.t. generating cuda kernels out of python expressions).

The supporting code for numexpr was something like:

import numexpr
import numpy as np

def build_arg_expr (arg, args):
if isinstance(arg, Expr):
   # recursively build the expression
   arg_expr, arg_args = arg.build_expr()
   args.update(arg_args)
   return arg_expr
else:
   # unique argument identifier
   arg_id = "arg_%d" % id(arg)
   args[arg_id] = arg
   return arg_id
 
# generic expression builder
class Expr:
  def evaluate(self):
  expr, args = self.build_expr()
  return numexpr.evaluate(expr, local_dict = args, global_dict = {})
 
  def __repr__ (self):
  return self.evaluate().__repr__()
 
  def __str__ (self):
  return self.evaluate().__str__()
 
  def __add__ (self, other):
  return ExprAdd(self, other)
 
# expression builder for adds
class ExprAdd(Expr):
  def __init__(self, arg1, arg2):
  self.arg1 = arg1
  self.arg2 = arg2
 
  def build_expr(self):
  args = {}
  expr1 = build_arg_expr(self.arg1, args)
  expr2 = build_arg_expr(self.arg2, args)
  return "("+expr1+") + ("+expr2+")", args
 
# ndarray-like class to generate expression builders
class LazyNdArray(np.ndarray):
  def __add__ (self, other):
  return ExprAdd(self, other)
 
# build a LazyNdArray
def lazy (arg):
return arg.view(LazyNdArray)
 
# evaluate with numexpr an arbitrary expression builder 
def evaluate(arg):
return arg.evaluate()


The thing here is to always return to the user something that looks like an
ndarray.

As you can see the whole thing is not very complex, but some less funny code had
to be written meanwhile for work and I just dropped this :)


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Matthieu Brucher

2012/2/20 Daniele Nicolodi 

> On 18/02/12 04:54, Sturla Molden wrote:
> > This is not true. C++ can be much easier, particularly for those who
> > already know Python. The problem: C++ textbooks teach C++ as a subset
> > of C. Writing C in C++ just adds the complexity of C++ on top of C,
> > for no good reason. I can write FORTRAN in any language, it does not
> > mean it is a good idea. We would have to start by teaching people to
> > write good C++.  E.g., always use the STL like Python built-in types
> > if possible. Dynamic memory should be std::vector, not new or malloc.
> > Pointers should be replaced with references. We would have to write a
> > C++ programming tutorial that is based on Pyton knowledge instead of
> > C knowledge.
>
> Hello Sturla,
>
> unrelated to the numpy tewrite debate, can you please suggest some
> resources you think can be used to learn how to program C++ "the proper
> way"?
>

One of the best books may be "Accelerated C++" or the new Stroutrup's book
(not the C++ language)

Matthieu
-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Daniele Nicolodi

On 18/02/12 04:54, Sturla Molden wrote:
> This is not true. C++ can be much easier, particularly for those who
> already know Python. The problem: C++ textbooks teach C++ as a subset
> of C. Writing C in C++ just adds the complexity of C++ on top of C,
> for no good reason. I can write FORTRAN in any language, it does not
> mean it is a good idea. We would have to start by teaching people to
> write good C++.  E.g., always use the STL like Python built-in types
> if possible. Dynamic memory should be std::vector, not new or malloc.
> Pointers should be replaced with references. We would have to write a
> C++ programming tutorial that is based on Pyton knowledge instead of
> C knowledge.

Hello Sturla,

unrelated to the numpy tewrite debate, can you please suggest some
resources you think can be used to learn how to program C++ "the proper
way"?

Thank you. Cheers,
-- 
Daniele
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Francesc Alted

On Feb 20, 2012, at 7:08 PM, Dag Sverre Seljebotn wrote:

> On 02/20/2012 09:34 AM, Christopher Jordan-Squire wrote:
>> On Mon, Feb 20, 2012 at 9:18 AM, Dag Sverre Seljebotn
>>   wrote:
>>> On 02/20/2012 08:55 AM, Sturla Molden wrote:
 Den 20.02.2012 17:42, skrev Sturla Molden:
> There are still other options than C or C++ that are worth considering.
> One would be to write NumPy in Python. E.g. we could use LLVM as a
> JIT-compiler and produce the performance critical code we need on the fly.
> 
> 
 
 LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster
 than GCC and often produces better machine code. They can therefore be
 used inside an array library. It would give a faster NumPy, and we could
 keep most of it in Python.
>>> 
>>> I think it is moot to focus on improving NumPy performance as long as in
>>> practice all NumPy operations are memory bound due to the need to take a
>>> trip through system memory for almost any operation. C/C++ is simply
>>> "good enough". JIT is when you're chasing a 2x improvement or so, but
>>> today NumPy can be 10-20x slower than a Cython loop.
>>> 
>> 
>> I don't follow this. Could you expand a bit more? (Specifically, I
>> wasn't aware that numpy could be 10-20x slower than a cython loop, if
>> we're talking about the base numpy library--so core operations. I'm
> 
> The problem with NumPy is the temporaries needed -- if you want to compute
> 
> A + B + np.sqrt(D)
> 
> then, if the arrays are larger than cache size (a couple of megabytes), 
> then each of those operations will first transfer the data in and out 
> over the memory bus. I.e. first you compute an element of sqrt(D), then 
> the result of that is put in system memory, then later the same number 
> is read back in order to add it to an element in B, and so on.
> 
> The compute-to-bandwidth ratio of modern CPUs is between 30:1 and 
> 60:1... so in extreme cases it's cheaper to do 60 additions than to 
> transfer a single number from system memory.
> 
> It is much faster to only transfer an element (or small block) from each 
> of A, B, and D to CPU cache, then do the entire expression, then 
> transfer the result back. This is easy to code in Cython/Fortran/C and 
> impossible with NumPy/Python.
> 
> This is why numexpr/Theano exists.

Well, I can't speak for Theano (it is quite more general than numexpr, and more 
geared towards using GPUs, right?), but this was certainly the issue that make 
David Cooke to create numexpr.  A more in-deep explanation about this problem 
can be seen in:

http://www.euroscipy.org/talk/1657

which includes some graphical explanations.

-- Francesc Alted



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Dag Sverre Seljebotn

On 02/20/2012 09:34 AM, Christopher Jordan-Squire wrote:
> On Mon, Feb 20, 2012 at 9:18 AM, Dag Sverre Seljebotn
>   wrote:
>> On 02/20/2012 08:55 AM, Sturla Molden wrote:
>>> Den 20.02.2012 17:42, skrev Sturla Molden:
 There are still other options than C or C++ that are worth considering.
 One would be to write NumPy in Python. E.g. we could use LLVM as a
 JIT-compiler and produce the performance critical code we need on the fly.

>>>
>>> LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster
>>> than GCC and often produces better machine code. They can therefore be
>>> used inside an array library. It would give a faster NumPy, and we could
>>> keep most of it in Python.
>>
>> I think it is moot to focus on improving NumPy performance as long as in
>> practice all NumPy operations are memory bound due to the need to take a
>> trip through system memory for almost any operation. C/C++ is simply
>> "good enough". JIT is when you're chasing a 2x improvement or so, but
>> today NumPy can be 10-20x slower than a Cython loop.
>>
>
> I don't follow this. Could you expand a bit more? (Specifically, I
> wasn't aware that numpy could be 10-20x slower than a cython loop, if
> we're talking about the base numpy library--so core operations. I'm

The problem with NumPy is the temporaries needed -- if you want to compute

A + B + np.sqrt(D)

then, if the arrays are larger than cache size (a couple of megabytes), 
then each of those operations will first transfer the data in and out 
over the memory bus. I.e. first you compute an element of sqrt(D), then 
the result of that is put in system memory, then later the same number 
is read back in order to add it to an element in B, and so on.

The compute-to-bandwidth ratio of modern CPUs is between 30:1 and 
60:1... so in extreme cases it's cheaper to do 60 additions than to 
transfer a single number from system memory.

It is much faster to only transfer an element (or small block) from each 
of A, B, and D to CPU cache, then do the entire expression, then 
transfer the result back. This is easy to code in Cython/Fortran/C and 
impossible with NumPy/Python.

This is why numexpr/Theano exists.

You can make the slowdown over Cython/Fortran/C almost arbitrarily large 
by adding terms to the equation above. So of course, the actual slowdown 
depends on your usecase.

> also not totally sure why a JIT is a 2x improvement or so vs. cython.
> Not that a disagree on either of these points, I'd just like a bit
> more detail.)

I meant that the JIT may be a 2x improvement over the current NumPy C 
code. There's some logic when iterating arrays that could perhaps be 
specialized away depending on the actual array layout at runtime.

But I'm thinking that probably a JIT wouldn't help all that much, so 
it's probably 1x -- the 2x was just to be very conservative w.r.t. the 
argument I was making, as I don't know the NumPy C sources well enough.

Dag
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Sturla Molden

Den 20.02.2012 18:34, skrev Christopher Jordan-Squire:
> I don't follow this. Could you expand a bit more? (Specifically, I 
> wasn't aware that numpy could be 10-20x slower than a cython loop, if 
> we're talking about the base numpy library--so core operations. I'm 
> also not totally sure why a JIT is a 2x improvement or so vs. cython. 
> Not that a disagree on either of these points, I'd just like a bit 
> more detail.) 

Dag Sverre is right about this.

NumPy is memory bound, Cython loops are (usually) CPU bound.

If you write:

 x[:] = a + b + c  # numpy arrays

then this happens (excluding reference counting):

- allocate temporary array
- loop over a and b, add to temporary
- allocate 2nd temporary array
- loop over 1st temporary array  and c, add to 2nd
- deallocate 1st temporary array
- loop over 2nd temporary array, assign to x
- deallocate 2nd temporary array

Since memory access is slow, memory allocation and deallocation
is slow, and computation is fast, this will be perhaps 10 times
slower than what we could do with a loop in Cython:

 for i in range(n):
 x[i] = a[i] + b[i] + c[i]

I.e. we get rid of the temporary arrays and the multiple loops.
All the temporaries here are put in registers.

It is streaming data into the CPU that is slow, not computing!

It has actually been experimented with streaming data in a
compressed form, and decompressing on the fly, as data access
still dominates the runtime (even if you do a lot of computing
per element).


Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Sturla Molden

Den 20.02.2012 18:18, skrev Dag Sverre Seljebotn:
>
> I think it is moot to focus on improving NumPy performance as long as in
> practice all NumPy operations are memory bound due to the need to take a
> trip through system memory for almost any operation. C/C++ is simply
> "good enough". JIT is when you're chasing a 2x improvement or so, but
> today NumPy can be 10-20x slower than a Cython loop.
>
> You need at least a slightly different Python API to get anywhere, so
> numexpr/Theano is the right place to work on an implementation of this
> idea. Of course it would be nice if numexpr/Theano offered something as
> convenient as
>
> with lazy:
>   arr = A + B + C # with all of these NumPy arrays
> # compute upon exiting...
>
>

Lazy evaluation is nice. But I was thinking more about how to avoid C++ 
in the NumPy core, so more than 2 or 3 programmers could contribute.

I.e. my point was not that loops in LLVM would be much faster than C++ 
(that is besides the point), but the code could be written in Python 
instead of C++.

But if the idea is to support other languages as well (which I somehow 
forgot), then this approach certainly becomes less useful.

(OTOH, lazy evaluation is certainly easier to achieve with JIT 
compilation. But that will have to wait until NumPy 5.0 perhaps...)


Sturla









___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Christopher Jordan-Squire

On Mon, Feb 20, 2012 at 9:18 AM, Dag Sverre Seljebotn
 wrote:
> On 02/20/2012 08:55 AM, Sturla Molden wrote:
>> Den 20.02.2012 17:42, skrev Sturla Molden:
>>> There are still other options than C or C++ that are worth considering.
>>> One would be to write NumPy in Python. E.g. we could use LLVM as a
>>> JIT-compiler and produce the performance critical code we need on the fly.
>>>
>>>
>>
>> LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster
>> than GCC and often produces better machine code. They can therefore be
>> used inside an array library. It would give a faster NumPy, and we could
>> keep most of it in Python.
>
> I think it is moot to focus on improving NumPy performance as long as in
> practice all NumPy operations are memory bound due to the need to take a
> trip through system memory for almost any operation. C/C++ is simply
> "good enough". JIT is when you're chasing a 2x improvement or so, but
> today NumPy can be 10-20x slower than a Cython loop.
>

I don't follow this. Could you expand a bit more? (Specifically, I
wasn't aware that numpy could be 10-20x slower than a cython loop, if
we're talking about the base numpy library--so core operations. I'm
also not totally sure why a JIT is a 2x improvement or so vs. cython.
Not that a disagree on either of these points, I'd just like a bit
more detail.)

Thanks,
Chris

> You need at least a slightly different Python API to get anywhere, so
> numexpr/Theano is the right place to work on an implementation of this
> idea. Of course it would be nice if numexpr/Theano offered something as
> convenient as
>
> with lazy:
>     arr = A + B + C # with all of these NumPy arrays
> # compute upon exiting...
>
> Dag
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Matthieu Brucher

2012/2/19 Sturla Molden 

> Den 19.02.2012 10:28, skrev Mark Wiebe:
> >
> > Particular styles of using templates can cause this, yes. To properly
> > do this kind of advanced C++ library work, it's important to think
> > about the big-O notation behavior of your template instantiations, not
> > just the big-O notation of run-time. C++ templates have a
> > turing-complete language (which is said to be quite similar to
> > haskell, but spelled vastly different) running at compile time in
> > them. This is what gives template meta-programming in C++ great power,
> > but since templates weren't designed for this style of programming
> > originally, template meta-programming is not very easy.
> >
> >
>
> The problem with metaprogramming is that we are doing manually the work
> that belongs to the compiler. Blitz++ was supposed to be a library that
> "thought like a compiler". But then compilers just got better. Today, it
> is no longer possible for a numerical library programmer to outsmart an
> optimizing C++ compiler. All metaprogramming can do today is produce
> error messages noone can understand. And the resulting code will often
> be slower because the compiler has less opportunities to do its work.
>

As I've said, the compiler is pretty much stupid. It cannot do what
Blitzz++ did, or what Eigen is currently doing, mainly because of the basis
different languages (C or C++).

-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Francesc Alted

On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote:
> You need at least a slightly different Python API to get anywhere, so 
> numexpr/Theano is the right place to work on an implementation of this 
> idea. Of course it would be nice if numexpr/Theano offered something as 
> convenient as
> 
> with lazy:
> arr = A + B + C # with all of these NumPy arrays
> # compute upon exiting…

Hmm, that would be cute indeed.  Do you have an idea on how the code in the 
with context could be passed to the Python AST compiler (à la 
numexpr.evaluate("A + B + C"))?

-- Francesc Alted



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Matthieu Brucher

2012/2/19 Nathaniel Smith 

> On Sun, Feb 19, 2012 at 9:16 AM, David Cournapeau 
> wrote:
> > On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe  wrote:
> >> Is there a specific
> >> target platform/compiler combination you're thinking of where we can do
> >> tests on this? I don't believe the compile times are as bad as many
> people
> >> suspect, can you give some simple examples of things we might do in
> NumPy
> >> you expect to compile slower in C++ vs C?
> >
> > Switching from gcc to g++ on the same codebase should not change much
> > compilation times. We should test, but that's not what worries me.
> > What worries me is when we start using C++ specific code, STL and co.
> > Today, scipy.sparse.sparsetools takes half of the build time  of the
> > whole scipy, and it does not even use fancy features. It also takes Gb
> > of ram when building in parallel.
>
> I like C++ but it definitely does have issues with compilation times.
>
> IIRC the main problem is very simple: STL and friends (e.g. Boost) are
> huge libraries, and because they use templates, the entire source code
> is in the header files. That means that as soon as you #include a few
> standard C++ headers, your innocent little source file has suddenly
> become hundreds of thousands of lines long, and it just takes the
> compiler a while to churn through megabytes of source code, no matter
> what it is. (Effectively you recompile some significant fraction of
> STL from scratch on every file, and then throw it away.)
>

In fact Boost tries to be clean about this. Up to a few minor releases of
GCC, their headers were a mess. When you included something, a lot of
additional code was brought, and the compile-time exploded. But this is no
longer the case. If we restrict the core to a few includes, even with
templates, it should not be long to compile.

-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Sturla Molden

Den 20.02.2012 18:14, skrev Charles R Harris:
>
> Would that work for Ruby also? One of the advantages of C++ is that 
> the code doesn't need to be refactored to start with, just modified 
> step by step going into the future. I think PyPy is close to what you 
> are talking about.
>

If we plant to support more languages than Python, it might be better to 
use C++ (sorry).

But it does not mean that LLVM cannot be used. Either one can generate C 
or C++, or just use the assembly language (which is very simple and 
readable too: http://llvm.org/docs/LangRef.html).

We have exact knowledge about an ndarray at runtime:

- dtype
- dimensions
- strides
- whether the array is contiguous or not

This can be JIT-compiled into specialized looping code by LLVM. These 
kernels can then be stored in a database and resued.

If it matters, LLVM is embeddable in C++.


Sturla








___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Matthieu Brucher

2012/2/19 Matthew Brett 

> Hi,
>
> On Sat, Feb 18, 2012 at 8:38 PM, Travis Oliphant 
> wrote:
>
> > We will need to see examples of what Mark is talking about and clarify
> some
> > of the compiler issues.   Certainly there is some risk that once code is
> > written that it will be tempting to just use it.   Other approaches are
> > certainly worth exploring in the mean-time, but C++ has some strong
> > arguments for it.
>
> The worry as I understand it is that a C++ rewrite might make the
> numpy core effectively a read-only project for anyone but Mark.  Do
> you have any feeling for whether that is likely?
>

Some of us are C developers, other are C++. It will depend on the
background of each of us.


> How would numpylib compare to libraries like eigen?  How likely do you
> think it would be that unrelated projects would use numpylib rather
> than eigen or other numerical libraries?  Do you think the choice of
> C++ rather than C will influence whether other projects will take it
> up?
>

I guess that the C++ port may open a door to change the back-end, and
perhaps use Eigen, or ArBB. As those guys (ArBB) wanted to provided a
Python interface compatible with Numpy to their VM, it may be interesting
to be able to change back-ends (although it is limited to one platform and
2 OS).

-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Matthieu Brucher

> Would it be fair to say then, that you are expecting the discussion
> about C++ will mainly arise after the Mark has written the code?   I
> can see that it will be easier to specific at that point, but there
> must be a serious risk that it will be too late to seriously consider
> an alternative approach.
>
>
> We will need to see examples of what Mark is talking about and clarify
> some of the compiler issues.   Certainly there is some risk that once code
> is written that it will be tempting to just use it.   Other approaches are
> certainly worth exploring in the mean-time, but C++ has some strong
> arguments for it.
>

Compilers for C++98 are now stable enough (except on Bluegene, see the
Boost distribution with xlc++)
C++ helps a lot to enhance robustness.ts?

>
> From my perspective having a standalone core NumPy is still a goal.   The
> primary advantages of having a NumPy library (call it NumLib for the sake
> of argument) are
>
> 1) Ability for projects like PyPy, IronPython, and Jython to use it more
> easily
> 2) Ability for Ruby, Perl, Node.JS, and other new languages to use the
> code for their technical computing projects.
> 3) increasing the number of users who can help make it more solid
> 4) being able to build the user-base (and corresponding performance with
> eye-balls from Intel, NVidia, AMD, Microsoft, Google, etc. looking at the
> code).
>
> The disadvantages I can think of:
>  1) More users also means we might risk "lowest-commond-denominator"
> problems --- i.e. trying to be too much to too many may make it not useful
> for anyone. Also, more users means more people with opinions that might be
> difficult to re-concile.
> 2) The work of doing the re-write is not small:  probably at least 6
> person-months
> 3) Not being able to rely on Python objects (dictionaries, lists, and
> tuples are currently used in the code-base quite a bit --- though the
> re-factor did show some examples of how to remove this usage).
> 4) Handling of "Object" arrays requires some re-design.
>
> I'm sure there are other factors that could be added to both lists.
>
> -Travis
>
>
>
> Thanks a lot for the reply,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Matthieu Brucher

> C++11 has this option:
>
> for (auto& item : container) {
> // iterate over the container object,
> // get a reference to each item
> //
> // "container" can be an STL class or
> // A C-style array with known size.
> }
>
> Which does this:
>
> for item in container:
> pass
>

It is even better than using the macro way because the compiler knows
everything is constant (start and end), so it can do better things.


> > Using C++ templates to generate ufunc loops is an obvious application,
> > but again, in the simple examples
>
> Template metaprogramming?
>
> Don't even think about it. It is brain dead to try to outsmart the
> compiler.
>

It is really easy to outsmart the compiler. Really. I use metaprogramming
for loop creation to optimize cache behavior, communication in parallel
environments, and there is no way the compiler would have done things as
efficiently (and there is a lot of leeway to enhance my code).

-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Dag Sverre Seljebotn

On 02/20/2012 08:55 AM, Sturla Molden wrote:
> Den 20.02.2012 17:42, skrev Sturla Molden:
>> There are still other options than C or C++ that are worth considering.
>> One would be to write NumPy in Python. E.g. we could use LLVM as a
>> JIT-compiler and produce the performance critical code we need on the fly.
>>
>>
>
> LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster
> than GCC and often produces better machine code. They can therefore be
> used inside an array library. It would give a faster NumPy, and we could
> keep most of it in Python.

I think it is moot to focus on improving NumPy performance as long as in 
practice all NumPy operations are memory bound due to the need to take a 
trip through system memory for almost any operation. C/C++ is simply 
"good enough". JIT is when you're chasing a 2x improvement or so, but 
today NumPy can be 10-20x slower than a Cython loop.

You need at least a slightly different Python API to get anywhere, so 
numexpr/Theano is the right place to work on an implementation of this 
idea. Of course it would be nice if numexpr/Theano offered something as 
convenient as

with lazy:
 arr = A + B + C # with all of these NumPy arrays
# compute upon exiting...

Dag
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Charles R Harris

On Mon, Feb 20, 2012 at 9:55 AM, Sturla Molden  wrote:

> Den 20.02.2012 17:42, skrev Sturla Molden:
> > There are still other options than C or C++ that are worth considering.
> > One would be to write NumPy in Python. E.g. we could use LLVM as a
> > JIT-compiler and produce the performance critical code we need on the
> fly.
> >
> >
>
> LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster
> than GCC and often produces better machine code. They can therefore be
> used inside an array library. It would give a faster NumPy, and we could
> keep most of it in Python.
>
>
Would that work for Ruby also? One of the advantages of C++ is that the
code doesn't need to be refactored to start with, just modified step by
step going into the future. I think PyPy is close to what you are talking
about.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Sturla Molden

Den 20.02.2012 17:42, skrev Sturla Molden:
> There are still other options than C or C++ that are worth considering.
> One would be to write NumPy in Python. E.g. we could use LLVM as a
> JIT-compiler and produce the performance critical code we need on the fly.
>
>

LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster 
than GCC and often produces better machine code. They can therefore be 
used inside an array library. It would give a faster NumPy, and we could 
keep most of it in Python.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Sturla Molden

Den 19.02.2012 00:09, skrev David Cournapeau:
> There are better languages than C++ that has most of the technical 
> benefits stated in this discussion (rust and D being the most 
> "obvious" ones), but whose usage is unrealistic today for various 
> reasons: knowledge, availability on "esoteric" platforms, etc… A new 
> language is completely ridiculous.

There are still other options than C or C++ that are worth considering. 
One would be to write NumPy in Python. E.g. we could use LLVM as a 
JIT-compiler and produce the performance critical code we need on the fly.

Sturla



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Sturla Molden

Den 20.02.2012 08:35, skrev Paul Anton Letnes:
> As far as I can understand, implementing element-wise operations, slicing, 
> and a host of other NumPy features is in some sense pointless - the Fortran 
> compiler authors have already done it for us.

Only if you know the array dimensions in advance.

Sturla



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Sturla Molden

Den 20.02.2012 08:35, skrev Paul Anton Letnes:
> In the language wars, I have one question. Why is Fortran not being 
> considered? Fortran already implements many of the features that we want in 
> NumPy:

Yes ... but it does not make Fortran a systems programming language. 
Making NumPy is different from using it.

> - slicing and similar operations, at least some of the fancy indexing kind
> - element-wise array operations and function calls
> - array bounds-checking and other debugging aid (with debugging flags)

That is nice for numerical computing, but not really needed to make NumPy.


> - arrays that mentally map very well onto numpy arrays. To me, this spells +1 
> to ease of contribution, over some abstract C/C++ template

Mentally perhaps, but not binary. NumPy needs uniformly strided memory 
on the binary level. Fortran just gives this at the mental level. E.g. 
there is nothing that dictates a Fortran pointer has to be a view, the 
compiler is free to employ copy-in copy-out. In Fortran, a function call 
can invalidate a pointer.  One would therefore have to store the array 
in an array of integer*1, and use the intrinsic function transfer() to 
parse the contents into NumPy dtypes.

> - in newer standards it has some nontrivial mathematical functions: gamma, 
> bessel, etc. that numpy lacks right now

That belongs to SciPy.


> - compilers that are good at optimizing for floating-point performance, 
> because that's what Fortran is all about

Insanely good, but not when we start to do the (binary, not mentally) 
strided access that NumPy needs. (Not that C compilers would be any better.)



> - not Fortran as such, but BLAS and LAPACK are easily accessed by Fortran
> - possibly other numerical libraries that can be helpful
> - Fortran has, in its newer standards, thought of C interoperability. We 
> could still keep bits of the code in C (or even C++?) if we'd like to, or 
> perhaps f2py/Cython could do the wrapping.

Not f2py, as it depends on NumPy.

- some programmers know Fortran better than C++. Fortran is at least used by 
many science guys, like me.


That is a valid arguments. Fortran is also much easier to read and debug.


Sturla





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Sturla Molden

Den 20.02.2012 10:54, skrev Pauli Virtanen:
> Fortran is OK for simple numerical algorithms, but starts to suck 
> heavily if you need to do any string handling, I/O, complicated logic, 
> or data structures

For string handling, C is actually worse than Fortran. In Fortran a 
string can be sliced like in Python. It is not as nice as Python, but 
far better than C.

Fortran's built-in I/O syntax is archaic, but the ISO C bindings in 
Fortran 2003 means one can use other means of I/O (posix, win api, C 
stdio) in a portable way.

Sturla


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Sturla Molden

Den 20.02.2012 12:43, skrev Charles R Harris:
>
>
> There also used to be a problem with unsigned types not being 
> available. I don't know if that is still the case.
>

Fortran -- like Python and Java -- does not have built-in unsigned 
integer types. It is never really a problem though. One can e.g. use a 
longer integer or keep them in an array of bytes.

(Fortran 2003 is OOP so it is possible to define one if needed. Not 
saying it is a good idea.)


Sturla



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Charles R Harris

On Mon, Feb 20, 2012 at 2:54 AM, Pauli Virtanen  wrote:

> 20.02.2012 08:35, Paul Anton Letnes kirjoitti:
> > In the language wars, I have one question.
> > Why is Fortran not being considered?
>
> Fortran is OK for simple numerical algorithms, but starts to suck
> heavily if you need to do any string handling, I/O, complicated logic,
> or data structures.
>
> Most of the work in Numpy implementation is not actually in numerics,
> but in figuring out the correct operation to dispatch the computations
> to. So, this is one reason why Fortran is not considered.
>
>
There also used to be a problem with unsigned types not being available. I
don't know if that is still the case.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Stéfan van der Walt

On Mon, Feb 20, 2012 at 1:54 AM, Pauli Virtanen  wrote:
> 20.02.2012 08:35, Paul Anton Letnes kirjoitti:
>> In the language wars, I have one question.
>> Why is Fortran not being considered?
>
> Fortran is OK for simple numerical algorithms, but starts to suck
> heavily if you need to do any string handling, I/O, complicated logic,
> or data structures.

Out of curiosity, is this still true for the latest Fortran versions?
I guess there the problem may be compiler support over various
platforms.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Pauli Virtanen

20.02.2012 08:35, Paul Anton Letnes kirjoitti:
> In the language wars, I have one question. 
> Why is Fortran not being considered?

Fortran is OK for simple numerical algorithms, but starts to suck
heavily if you need to do any string handling, I/O, complicated logic,
or data structures.

Most of the work in Numpy implementation is not actually in numerics,
but in figuring out the correct operation to dispatch the computations
to. So, this is one reason why Fortran is not considered.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-20 Thread Samuel John


On 17.02.2012, at 21:46, Ralf Gommers wrote:
> [...]
> So far no one has managed to build the numpy/scipy combo with the LLVM-based 
> compilers, so if you were willing to have a go at fixing that it would be 
> hugely appreciated. See http://projects.scipy.org/scipy/ticket/1500 for 
> details.
> 
> Once that's fixed, numpy can switch to using it for releases.

Well, I had great success with using clang and clang++ (which uses llvm) to 
compile both numpy and scipy on OS X 10.7.3.

Samuel

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Paul Anton Letnes

In the language wars, I have one question. Why is Fortran not being considered? 
Fortran already implements many of the features that we want in NumPy:
- slicing and similar operations, at least some of the fancy indexing kind
- element-wise array operations and function calls
- array bounds-checking and other debugging aid (with debugging flags)
- arrays that mentally map very well onto numpy arrays. To me, this spells +1 
to ease of contribution, over some abstract C/C++ template
- in newer standards it has some nontrivial mathematical functions: gamma, 
bessel, etc. that numpy lacks right now
- compilers that are good at optimizing for floating-point performance, because 
that's what Fortran is all about
- not Fortran as such, but BLAS and LAPACK are easily accessed by Fortran
- possibly other numerical libraries that can be helpful
- Fortran has, in its newer standards, thought of C interoperability. We could 
still keep bits of the code in C (or even C++?) if we'd like to, or perhaps 
f2py/Cython could do the wrapping.
- some programmers know Fortran better than C++. Fortran is at least used by 
many science guys, like me. Until someone comes along with actual numbers or at 
least anecdotal evidence, I don't think the "more programmers know X than Y" 
argument is too interesting. Personally I've learned both, and Fortran is much 
more accessible than C++ (to me) if you're used to the "work with (numpy) 
arrays" mentality.

As far as I can understand, implementing element-wise operations, slicing, and 
a host of other NumPy features is in some sense pointless - the Fortran 
compiler authors have already done it for us. Of course some nice wrapping will 
be needed in C, Cython, f2py, or similar. Since my understanding is limited, 
I'd be interested in being proved wrong, though :)

Paul

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Stéfan van der Walt

On Feb 19, 2012 4:14 PM, "Sturla Molden"  wrote:
>
> Den 20.02.2012 00:39, skrev Nathaniel Smith:
> > But there's an order-of-magnitude difference in compile times between
> > most real-world C projects and most real-world C++ projects. It might
> > not be a deal-breaker and it might not apply for subset of C++ you're
> > planning to use, but AFAICT that's the facts.
>
> This is mainly a complaint about the build-process.

This has nothing to do with the build process. More complex languages take
longer to compile. The benchmark shown is also entirely independent of
build system.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Sturla Molden

Den 20.02.2012 00:39, skrev Nathaniel Smith:
> But there's an order-of-magnitude difference in compile times between 
> most real-world C projects and most real-world C++ projects. It might 
> not be a deal-breaker and it might not apply for subset of C++ you're 
> planning to use, but AFAICT that's the facts.

This is mainly a complaint about the build-process. Maybe make or 
distutis are broken, I don't know. But with a sane build tool (e.g. MS 
Visual Studio or Eclipse) this is not a problem. You just recompile the 
file you are working with, not the rest (unless you do a clean build).

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Charles R Harris

On Sun, Feb 19, 2012 at 4:42 PM, Nathaniel Smith  wrote:

> On Sun, Feb 19, 2012 at 1:42 PM, Neal Becker  wrote:
> > On Fedora linux I use ccache, which is completely transparant and makes
> a huge
> > difference in build times.
>
> ccache is fabulous (and it's fabulous for C too), but it only helps
> when 'make' has screwed up and decided to rebuild some file that
> didn't really need rebuilding, or when doing a clean build (which is
> more or less the same thing, if you think about it).
>
>
For Numpy, there are also other things going on. My clean builds finish in
about 30 seconds using one cpu, not so clean builds take longer.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Nathaniel Smith

On Sun, Feb 19, 2012 at 1:42 PM, Neal Becker  wrote:
> On Fedora linux I use ccache, which is completely transparant and makes a huge
> difference in build times.

ccache is fabulous (and it's fabulous for C too), but it only helps
when 'make' has screwed up and decided to rebuild some file that
didn't really need rebuilding, or when doing a clean build (which is
more or less the same thing, if you think about it).

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Nathaniel Smith

On Sun, Feb 19, 2012 at 7:13 PM, Mark Wiebe  wrote:
> On Sun, Feb 19, 2012 at 5:25 AM, Nathaniel Smith  wrote:
>> Precompiled headers can help some, but require complex and highly
>> non-portable build-system support. (E.g., gcc's precompiled header
>> constraints are here:
>> http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html -- only one
>> per source file, etc.)
>
> This doesn't look too bad, I think it would be worth setting these up in
> NumPy. The complexity you see is because its pretty close to the only way
> that precompiled headers could be set up.

Sure, so long as you know what headers every file needs. (Or more
likely, figure out a more-or-less complete set of all the headers
might ever need, and then -include that into every file.)

>> To demonstrate: a trivial hello-world in C using , versus a
>> trivial version in C++ using .
>>
>> On my laptop (gcc 4.5.2), compiling each program 100 times in a loop
>> requires:
>>  C: 2.28 CPU seconds
>>  C compiled with C++ compiler: 4.61 CPU seconds
>>  C++: 17.66 CPU seconds
>> Slowdown for using g++ instead of gcc: 2.0x
>> Slowdown for using C++ standard library: 3.8x
>> Total C++ penalty: 7.8x
>>
>> Lines of code compiled in each case:
>>  $ gcc -E hello.c | wc
>>      855    2039   16934
>>  $ g++ -E hello.cc | wc
>>    18569   40994  437954
>> (I.e., the C++ hello world is almost half a megabyte.)
>>
>> Of course we won't be using , but , 
>> etc. all have the same basic character.
>
>
> Thanks for doing the benchmark. It is a bit artificial, however, and when I
> tried these trivial examples with -O0 and -O2, the difference (in gcc 4.7)
> of the C++ compile time was about 4%. In NumPy presently as it is in C, the
> difference between -O0 and -O2 is very significant, and any comparisons need
> to take this kind of thing into account. When I said I thought the
> compile-time differences would be smaller than many people expect, I was
> thinking about how this optimization phase, which is shared between C and
> C++, often dominating the compile times.

Sure -- but the effective increased code-size for STL-using C++
affects the optimizer too; it's effectively re-optimizing all the used
parts of STL again for each source file. (Presumably in this benchmark
that half megabyte of extra code is mostly unused, and therefore
getting thrown out before the optimizer does any work on it -- but
that doesn't happen if you're actually using the library!) Maybe
things have gotten better in the last year or two, I dunno; if you run
a better benchmark I'll listen. But there's an order-of-magnitude
difference in compile times between most real-world C projects and
most real-world C++ projects. It might not be a deal-breaker and it
might not apply for subset of C++ you're planning to use, but AFAICT
that's the facts.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Mark Wiebe

On Sun, Feb 19, 2012 at 4:03 AM, David Cournapeau wrote:

> On Sun, Feb 19, 2012 at 9:28 AM, Mark Wiebe  wrote:
>
> > Is there anyone who uses a blue gene or small device which needs
> up-to-date
> > numpy support, that I could talk to directly? We really need a list of
> > supported platforms on the numpy wiki we can refer to when discussing
> this
> > stuff, it all seems very nebulous to me.
>
> They may not need an up to date numpy version now, but if stopping
> support for them is a requirement for C++, it must be kept in mind. I
> actually suspect Travis to have more details on the big iron side of
> things. On the small side of things:
> http://projects.scipy.org/numpy/ticket/1969
>
> This may seem like not very useful - but that's part of what a open
> source project is all about in my mind.
>
> >
> > Particular styles of using templates can cause this, yes. To properly do
> > this kind of advanced C++ library work, it's important to think about the
> > big-O notation behavior of your template instantiations, not just the
> big-O
> > notation of run-time. C++ templates have a turing-complete language
> (which
> > is said to be quite similar to haskell, but spelled vastly different)
> > running at compile time in them. This is what gives template
> > meta-programming in C++ great power, but since templates weren't designed
> > for this style of programming originally, template meta-programming is
> not
> > very easy.
>
> scipy.sparse.sparsetools is quite straightforward in its usage of
> templates (would be great if you could suggest improvement BTW, e.g.
> scipy/sparse/sparsetools/csr.h), and does not by itself use any
> meta-template programming.
>

I took a look, and I think the reason this is so slow to compile and uses
so much memory is visible as follows:

[sparsetools]$ wc *.cxx | sort -n
   4039   13276  116263 csgraph_wrap.cxx
   6464   21385  189537 dia_wrap.cxx
  14002   45406  412262 coo_wrap.cxx
  32385  102534  963688 csc_wrap.cxx
  42997  140896 1313797 bsr_wrap.cxx
  50041  161127 1501400 csr_wrap.cxx
 149928  484624 4496947 total

That's almost 4.5MB of code, in 6 files. C/C++ compilers are not optimized
to compile this sort of thing fast, they are focused on more "human-style"
coding with smaller individual files. Looking at some of these
SWIG-generated files, the way they dispatch based on the input Python types
is bloated as well. Probably the main question I would ask is, does scipy
really need sparse matrix variants for all of int8, uint8, int16, uint16,
etc? Trimming away some of these might be reasonable, and would be a start
to improve compile times. The reason for the slowness is not C++ templates
in this example.

Cheers,
Mark


> I like that numpy can be built in a few seconds (at least without
> optimization), and consider this to be a useful feature.
>
> cheers,
>
> David
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Mark Wiebe

On Sun, Feb 19, 2012 at 5:25 AM, Nathaniel Smith  wrote:

> On Sun, Feb 19, 2012 at 9:16 AM, David Cournapeau 
> wrote:
> > On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe  wrote:
> >> Is there a specific
> >> target platform/compiler combination you're thinking of where we can do
> >> tests on this? I don't believe the compile times are as bad as many
> people
> >> suspect, can you give some simple examples of things we might do in
> NumPy
> >> you expect to compile slower in C++ vs C?
> >
> > Switching from gcc to g++ on the same codebase should not change much
> > compilation times. We should test, but that's not what worries me.
> > What worries me is when we start using C++ specific code, STL and co.
> > Today, scipy.sparse.sparsetools takes half of the build time  of the
> > whole scipy, and it does not even use fancy features. It also takes Gb
> > of ram when building in parallel.
>
> I like C++ but it definitely does have issues with compilation times.
>
> IIRC the main problem is very simple: STL and friends (e.g. Boost) are
> huge libraries, and because they use templates, the entire source code
> is in the header files. That means that as soon as you #include a few
> standard C++ headers, your innocent little source file has suddenly
> become hundreds of thousands of lines long, and it just takes the
> compiler a while to churn through megabytes of source code, no matter
> what it is. (Effectively you recompile some significant fraction of
> STL from scratch on every file, and then throw it away.)
>
> Precompiled headers can help some, but require complex and highly
> non-portable build-system support. (E.g., gcc's precompiled header
> constraints are here:
> http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html -- only one
> per source file, etc.)
>

This doesn't look too bad, I think it would be worth setting these up in
NumPy. The complexity you see is because its pretty close to the only way
that precompiled headers could be set up.


> To demonstrate: a trivial hello-world in C using , versus a
> trivial version in C++ using .
>
> On my laptop (gcc 4.5.2), compiling each program 100 times in a loop
> requires:
>  C: 2.28 CPU seconds
>  C compiled with C++ compiler: 4.61 CPU seconds
>  C++: 17.66 CPU seconds
> Slowdown for using g++ instead of gcc: 2.0x
> Slowdown for using C++ standard library: 3.8x
> Total C++ penalty: 7.8x
>
> Lines of code compiled in each case:
>  $ gcc -E hello.c | wc
>  8552039   16934
>  $ g++ -E hello.cc | wc
>18569   40994  437954
> (I.e., the C++ hello world is almost half a megabyte.)
>
> Of course we won't be using , but , 
> etc. all have the same basic character.
>

Thanks for doing the benchmark. It is a bit artificial, however, and when I
tried these trivial examples with -O0 and -O2, the difference (in gcc 4.7)
of the C++ compile time was about 4%. In NumPy presently as it is in C, the
difference between -O0 and -O2 is very significant, and any comparisons
need to take this kind of thing into account. When I said I thought the
compile-time differences would be smaller than many people expect, I was
thinking about how this optimization phase, which is shared between C and
C++, often dominating the compile times.

Cheers,
Mark


> -- Nathaniel
>
> (Test files attached, times were from:
>  time sh -c 'for i in $(seq 100); do gcc hello.c -o hello-c; done'
>  cp hello.c c-hello.cc
>  time sh -c 'for i in $(seq 100); do g++ c-hello.cc -o c-hello-cc; done'
>  time sh -c 'for i in $(seq 100); do g++ hello.cc -o hello-cc; done'
> and then summing the resulting user and system times.)
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Nathaniel Smith

On Sun, Feb 19, 2012 at 4:13 PM, xavier.gn...@gmail.com
 wrote:
> I'm no sure. If you want to be able to write A=B+C+D; with decent
> performances, I think you have to use a lib based on expression templates.
> It would be great if C++ compilers could automatically optimize out
> spurious copies into temporaries.
> However, I don't think the compilers are smart enough to do so...not yet.

But isn't this all irrelevant to numpy? Numpy is basically a large
collection of bare inner loops, plus a bunch of dynamic dispatch
machinery to make sure that the right one gets called at the right
time. Since these are exposed directly to Python, there's really no
way for the compiler to optimize out spurious copies or anything like
that -- even a very smart fortran-esque static compiler can't optimize
complex expressions like A=B+C+D if they simply aren't present at
compile time. And I guess even less-fancy C compilers will still be
able to optimize simple ufunc loops pretty well. IIUC the important
thing for numpy speed is the code that works out at runtime whether
this particular array would benefit from a column-based or row-based
strategy, chooses the right buffer sizes, etc., which isn't really
something a compiler can help with.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread xavier.gn...@gmail.com

On 02/19/2012 04:48 PM, Sturla Molden wrote:
> Den 19.02.2012 10:28, skrev Mark Wiebe:
>> Particular styles of using templates can cause this, yes. To properly
>> do this kind of advanced C++ library work, it's important to think
>> about the big-O notation behavior of your template instantiations, not
>> just the big-O notation of run-time. C++ templates have a
>> turing-complete language (which is said to be quite similar to
>> haskell, but spelled vastly different) running at compile time in
>> them. This is what gives template meta-programming in C++ great power,
>> but since templates weren't designed for this style of programming
>> originally, template meta-programming is not very easy.
>>
>>
> The problem with metaprogramming is that we are doing manually the work
> that belongs to the compiler. Blitz++ was supposed to be a library that
> "thought like a compiler". But then compilers just got better. Today, it
> is no longer possible for a numerical library programmer to outsmart an
> optimizing C++ compiler. All metaprogramming can do today is produce
> error messages noone can understand. And the resulting code will often
> be slower because the compiler has less opportunities to do its work.
>
> Sturla
"Today, it is no longer possible for a numerical library programmer to 
outsmart an optimizing C++ compiler."
I'm no sure. If you want to be able to write A=B+C+D; with decent 
performances, I think you have to use a lib based on expression templates.
It would be great if C++ compilers could automatically optimize out 
spurious copies into temporaries.
However, I don't think the compilers are smart enough to do so...not yet.

Xavier
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Sturla Molden

Den 19.02.2012 10:28, skrev Mark Wiebe:
>
> Particular styles of using templates can cause this, yes. To properly 
> do this kind of advanced C++ library work, it's important to think 
> about the big-O notation behavior of your template instantiations, not 
> just the big-O notation of run-time. C++ templates have a 
> turing-complete language (which is said to be quite similar to 
> haskell, but spelled vastly different) running at compile time in 
> them. This is what gives template meta-programming in C++ great power, 
> but since templates weren't designed for this style of programming 
> originally, template meta-programming is not very easy.
>
>

The problem with metaprogramming is that we are doing manually the work 
that belongs to the compiler. Blitz++ was supposed to be a library that 
"thought like a compiler". But then compilers just got better. Today, it 
is no longer possible for a numerical library programmer to outsmart an 
optimizing C++ compiler. All metaprogramming can do today is produce 
error messages noone can understand. And the resulting code will often 
be slower because the compiler has less opportunities to do its work.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Neal Becker

Sturla Molden wrote:

> 
> Den 18. feb. 2012 kl. 01:58 skrev Charles R Harris
> :
> 
>> 
>> 
>> On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau  wrote:
>> I don't think c++ has any significant advantage over c for high performance
>> libraries. I am not convinced by the number of people argument either: it is
>> not my experience that c++ is easier to maintain in a open source context,
>> where the level of people is far from consistent. I doubt many people did not
>> contribute to numoy because it is in c instead if c++. While this is somehow
>> subjective, there are reasons that c is much more common than c++ in that
>> context.
>> 
>> 
>> I think C++ offers much better tools than C for the sort of things in Numpy.
>> The compiler will take care of lots of things that now have to be hand
>> crafted and I wouldn't be surprised to see the code size shrink by a
>> significant factor.
> 
> The C++11 standard is fantastic. There are automatic data types, closures,
> reference counting, weak references, an improved STL with datatypes that map
> almost 1:1 against any built-in Python type, a sane threading API, regex, ect.
> Even prng is Mersenne Twister by standard. With C++11 it is finally possible
> to "write C++ (almost) like Python". On the downside, C++ takes a long term to
> learn, most C++ text books teach bad programming habits from the beginning to
> the end, and C++ becomes inherently dangerous if you write C++ like C. Many
> also abuse C++ as an bloatware generator. Templates can also be abused to
> write code that are impossible to debug. While it in theory could be better, C
> is a much smaller language. Personally I prefer C++ to C, but I am not
> convinced it will be better for NumPy.
> 

I'm all for c++11, but if you are worried about portability, dude, you have a 
bit of a problem here.

> I agree about Cython. It is nice for writing a Python interface for C, but get
> messy and unclean when used for anything else. It also has too much focus on
> adding all sorts of "new features" instead of correctness and stability. I
> don't trust it to generate bug-free code anymore.
> 
> For wrapping C, Swig might be just as good. For C++, SIP, CXX or Boost.Pyton
> work well too.
> 
> If cracy ideas are allowed, what about PyPy RPython? Or perhaps Go? Or even C#
> if a native compuler could be found?
> 
> 
c# is a non-starter if you want to run on linux.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Neal Becker

Nathaniel Smith wrote:

> On Sun, Feb 19, 2012 at 9:16 AM, David Cournapeau  wrote:
>> On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe  wrote:
>>> Is there a specific
>>> target platform/compiler combination you're thinking of where we can do
>>> tests on this? I don't believe the compile times are as bad as many people
>>> suspect, can you give some simple examples of things we might do in NumPy
>>> you expect to compile slower in C++ vs C?
>>
>> Switching from gcc to g++ on the same codebase should not change much
>> compilation times. We should test, but that's not what worries me.
>> What worries me is when we start using C++ specific code, STL and co.
>> Today, scipy.sparse.sparsetools takes half of the build time  of the
>> whole scipy, and it does not even use fancy features. It also takes Gb
>> of ram when building in parallel.
> 
> I like C++ but it definitely does have issues with compilation times.
> 
> IIRC the main problem is very simple: STL and friends (e.g. Boost) are
> huge libraries, and because they use templates, the entire source code
> is in the header files. That means that as soon as you #include a few
> standard C++ headers, your innocent little source file has suddenly
> become hundreds of thousands of lines long, and it just takes the
> compiler a while to churn through megabytes of source code, no matter
> what it is. (Effectively you recompile some significant fraction of
> STL from scratch on every file, and then throw it away.)
> 
> Precompiled headers can help some, but require complex and highly
> non-portable build-system support. (E.g., gcc's precompiled header
> constraints are here:
> http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html -- only one
> per source file, etc.)
> 
> To demonstrate: a trivial hello-world in C using , versus a
> trivial version in C++ using .
> 
> On my laptop (gcc 4.5.2), compiling each program 100 times in a loop requires:
>   C: 2.28 CPU seconds
>   C compiled with C++ compiler: 4.61 CPU seconds
>   C++: 17.66 CPU seconds
> Slowdown for using g++ instead of gcc: 2.0x
> Slowdown for using C++ standard library: 3.8x
> Total C++ penalty: 7.8x
> 
> Lines of code compiled in each case:
>   $ gcc -E hello.c | wc
>   8552039   16934
>   $ g++ -E hello.cc | wc
> 18569   40994  437954
> (I.e., the C++ hello world is almost half a megabyte.)
> 
> Of course we won't be using , but , 
> etc. all have the same basic character.
> 
> -- Nathaniel
> 
> (Test files attached, times were from:
>   time sh -c 'for i in $(seq 100); do gcc hello.c -o hello-c; done'
>   cp hello.c c-hello.cc
>   time sh -c 'for i in $(seq 100); do g++ c-hello.cc -o c-hello-cc; done'
>   time sh -c 'for i in $(seq 100); do g++ hello.cc -o hello-cc; done'
> and then summing the resulting user and system times.)

On Fedora linux I use ccache, which is completely transparant and makes a huge 
difference in build times.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Neal Becker

Sturla Molden wrote:

> Den 19.02.2012 01:12, skrev Nathaniel Smith:
>>
>> I don't oppose it, but I admit I'm not really clear on what the
>> supposed advantages would be. Everyone seems to agree that
>>-- Only a carefully-chosen subset of C++ features should be used
>>-- But this subset would be pretty useful
>> I wonder if anyone is actually thinking of the same subset :-).
> 
> Probably not, everybody have their own favourite subset.
> 
> 
>>
>> Chuck mentioned iterators as one advantage. I don't understand, since
>> iterators aren't even a C++ feature, they're just objects with "next"
>> and "dereference" operators. The only difference between these is
>> spelling:
>>for (my_iter i = foo.begin(); i != foo.end(); ++i) { ... }
>>for (my_iter i = my_iter_begin(foo); !my_iter_ended(&i);
>> my_iter_next(&i)) { ... }
>> So I assume he's thinking about something more, but the discussion has
>> been too high-level for me to figure out what.
> 
>

I find range interface (i.e., boost::range) is far more useful than raw 
iterator 
interface.  I always write all my algorithms using this abstraction.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Nathaniel Smith

On Sun, Feb 19, 2012 at 9:16 AM, David Cournapeau  wrote:
> On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe  wrote:
>> Is there a specific
>> target platform/compiler combination you're thinking of where we can do
>> tests on this? I don't believe the compile times are as bad as many people
>> suspect, can you give some simple examples of things we might do in NumPy
>> you expect to compile slower in C++ vs C?
>
> Switching from gcc to g++ on the same codebase should not change much
> compilation times. We should test, but that's not what worries me.
> What worries me is when we start using C++ specific code, STL and co.
> Today, scipy.sparse.sparsetools takes half of the build time  of the
> whole scipy, and it does not even use fancy features. It also takes Gb
> of ram when building in parallel.

I like C++ but it definitely does have issues with compilation times.

IIRC the main problem is very simple: STL and friends (e.g. Boost) are
huge libraries, and because they use templates, the entire source code
is in the header files. That means that as soon as you #include a few
standard C++ headers, your innocent little source file has suddenly
become hundreds of thousands of lines long, and it just takes the
compiler a while to churn through megabytes of source code, no matter
what it is. (Effectively you recompile some significant fraction of
STL from scratch on every file, and then throw it away.)

Precompiled headers can help some, but require complex and highly
non-portable build-system support. (E.g., gcc's precompiled header
constraints are here:
http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html -- only one
per source file, etc.)

To demonstrate: a trivial hello-world in C using , versus a
trivial version in C++ using .

On my laptop (gcc 4.5.2), compiling each program 100 times in a loop requires:
  C: 2.28 CPU seconds
  C compiled with C++ compiler: 4.61 CPU seconds
  C++: 17.66 CPU seconds
Slowdown for using g++ instead of gcc: 2.0x
Slowdown for using C++ standard library: 3.8x
Total C++ penalty: 7.8x

Lines of code compiled in each case:
  $ gcc -E hello.c | wc
  8552039   16934
  $ g++ -E hello.cc | wc
18569   40994  437954
(I.e., the C++ hello world is almost half a megabyte.)

Of course we won't be using , but , 
etc. all have the same basic character.

-- Nathaniel

(Test files attached, times were from:
  time sh -c 'for i in $(seq 100); do gcc hello.c -o hello-c; done'
  cp hello.c c-hello.cc
  time sh -c 'for i in $(seq 100); do g++ c-hello.cc -o c-hello-cc; done'
  time sh -c 'for i in $(seq 100); do g++ hello.cc -o hello-cc; done'
and then summing the resulting user and system times.)
#include 

int main(int argc, char **argv) 
{
  printf("Hello, world!\n");
  return 0;
}
#include 

int main(int argc, char **argv)
{
  std::cout << "Hello, world!" << std::endl;
  return 0;
}
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread David Warde-Farley

On 2012-02-19, at 12:47 AM, Benjamin Root wrote:

> Dude, have you seen the .c files in numpy/core? They are already read-only 
> for pretty much everybody but Mark.

I've managed to patch several of them without incident, and I do not do a lot 
of programming in C. It could be simpler, but it's not really a big deal to 
navigate once you've spent some time reading it.

I think the comments about the developer audience NumPy will attract are 
important. There may be lots of C++ developers out there, but the intersection 
of (truly competent in C++) and (likely to involve oneself in NumPy 
development) may well be quite small.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Ralf Gommers

On Sun, Feb 19, 2012 at 10:28 AM, Mark Wiebe  wrote:

> On Sun, Feb 19, 2012 at 3:16 AM, David Cournapeau wrote:
>
>> On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe  wrote:
>> > On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau 
>> > wrote:
>> >>
>> >> On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris
>> >>  wrote:
>> >>
>> >> >
>> >> > Well, we already have code obfuscation (DOUBLE_your_pleasure,
>> >> > FLOAT_your_boat), so we might as well let the compiler handle it.
>> >>
>> >> Yes, those are not great, but on the other hand, it is not that a
>> >> fundamental issue IMO.
>> >>
>> >> Iterators as we have it in NumPy is something that is clearly limited
>> >> by C. Writing the neighborhood iterator is the only case where I
>> >> really felt that C++ *could* be a significant improvement. I use
>> >> *could* because writing iterator in C++ is hard, and will be much
>> >> harder to read (I find both boost and STL - e.g. stlport -- iterators
>> >> to be close to write-only code). But there is the question on how you
>> >> can make C++-based iterators available in C. I would be interested in
>> >> a simple example of how this could be done, ignoring all the other
>> >> issues (portability, exception, etc…).
>> >>
>> >> The STL is also potentially compelling, but that's where we go into my
>> >> "beware of the dragons" area of C++. Portability loss, compilation
>> >> time increase and warts are significant there.
>> >> scipy.sparse.sparsetools has been a source of issues that was quite
>> >> high compared to its proportion of scipy amount code (we *do* have
>> >> some hard-won experience on C++-related issues).
>> >
>> >
>> > These standard library issues were definitely valid 10 years ago, but
>> all
>> > the major C++ compilers have great C++98 support now.
>>
>> STL varies significantly between platforms, I believe it is still the
>> case today. Do you know the status of the STL on bluegen, on small
>> devices ? We unfortunately cannot restrict ourselves to one well known
>> implementation (e.g. STLPort).
>
>
> Is there anyone who uses a blue gene or small device which needs
> up-to-date numpy support, that I could talk to directly? We really need a
> list of supported platforms on the numpy wiki we can refer to when
> discussing this stuff, it all seems very nebulous to me.
>

The list of officially supported platforms, where supported means we test
and release binaries if appropriate, is short: Windows, Linux, OS X.  There
are many platforms which are "supported" in the form of feedback on the
mailing list or Trac. This explanation is written down somewhere, not sure
where right now.

The best way to get an overview of those is to look at the distutils code
for various compilers, and at npy_cpu.h and similar. We're not talking
about expanding the number of officially supported platforms here, but not
breaking those unofficially supported ones (too badly). It's possible we
break those once in a while, which becomes apparent only when we get a
patch of a few lines long that fixes it. What should be avoided is that
those few-line patches have to turn into very large patches.

The most practical way to deal with this is probably to take two or three
non-standard platforms/compilers, set up a buildbot on them, and when
things break ensure that fixing it is not too hard.

>From recent history, I'd suggest AIX, an ARM device and a PathScale
compiler. But the limitation is probably finding someone willing to run a
buildbot.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread David Cournapeau

On Sun, Feb 19, 2012 at 9:28 AM, Mark Wiebe  wrote:

> Is there anyone who uses a blue gene or small device which needs up-to-date
> numpy support, that I could talk to directly? We really need a list of
> supported platforms on the numpy wiki we can refer to when discussing this
> stuff, it all seems very nebulous to me.

They may not need an up to date numpy version now, but if stopping
support for them is a requirement for C++, it must be kept in mind. I
actually suspect Travis to have more details on the big iron side of
things. On the small side of things:
http://projects.scipy.org/numpy/ticket/1969

This may seem like not very useful - but that's part of what a open
source project is all about in my mind.

>
> Particular styles of using templates can cause this, yes. To properly do
> this kind of advanced C++ library work, it's important to think about the
> big-O notation behavior of your template instantiations, not just the big-O
> notation of run-time. C++ templates have a turing-complete language (which
> is said to be quite similar to haskell, but spelled vastly different)
> running at compile time in them. This is what gives template
> meta-programming in C++ great power, but since templates weren't designed
> for this style of programming originally, template meta-programming is not
> very easy.

scipy.sparse.sparsetools is quite straightforward in its usage of
templates (would be great if you could suggest improvement BTW, e.g.
scipy/sparse/sparsetools/csr.h), and does not by itself use any
meta-template programming.

I like that numpy can be built in a few seconds (at least without
optimization), and consider this to be a useful feature.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Mark Wiebe

On Sun, Feb 19, 2012 at 3:16 AM, David Cournapeau wrote:

> On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe  wrote:
> > On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau 
> > wrote:
> >>
> >> On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris
> >>  wrote:
> >>
> >> >
> >> > Well, we already have code obfuscation (DOUBLE_your_pleasure,
> >> > FLOAT_your_boat), so we might as well let the compiler handle it.
> >>
> >> Yes, those are not great, but on the other hand, it is not that a
> >> fundamental issue IMO.
> >>
> >> Iterators as we have it in NumPy is something that is clearly limited
> >> by C. Writing the neighborhood iterator is the only case where I
> >> really felt that C++ *could* be a significant improvement. I use
> >> *could* because writing iterator in C++ is hard, and will be much
> >> harder to read (I find both boost and STL - e.g. stlport -- iterators
> >> to be close to write-only code). But there is the question on how you
> >> can make C++-based iterators available in C. I would be interested in
> >> a simple example of how this could be done, ignoring all the other
> >> issues (portability, exception, etc…).
> >>
> >> The STL is also potentially compelling, but that's where we go into my
> >> "beware of the dragons" area of C++. Portability loss, compilation
> >> time increase and warts are significant there.
> >> scipy.sparse.sparsetools has been a source of issues that was quite
> >> high compared to its proportion of scipy amount code (we *do* have
> >> some hard-won experience on C++-related issues).
> >
> >
> > These standard library issues were definitely valid 10 years ago, but all
> > the major C++ compilers have great C++98 support now.
>
> STL varies significantly between platforms, I believe it is still the
> case today. Do you know the status of the STL on bluegen, on small
> devices ? We unfortunately cannot restrict ourselves to one well known
> implementation (e.g. STLPort).


Is there anyone who uses a blue gene or small device which needs up-to-date
numpy support, that I could talk to directly? We really need a list of
supported platforms on the numpy wiki we can refer to when discussing this
stuff, it all seems very nebulous to me.

> Is there a specific
> > target platform/compiler combination you're thinking of where we can do
> > tests on this? I don't believe the compile times are as bad as many
> people
> > suspect, can you give some simple examples of things we might do in NumPy
> > you expect to compile slower in C++ vs C?
>
> Switching from gcc to g++ on the same codebase should not change much
> compilation times. We should test, but that's not what worries me.
> What worries me is when we start using C++ specific code, STL and co.
> Today, scipy.sparse.sparsetools takes half of the build time  of the
> whole scipy, and it does not even use fancy features. It also takes Gb
> of ram when building in parallel.
>

Particular styles of using templates can cause this, yes. To properly do
this kind of advanced C++ library work, it's important to think about the
big-O notation behavior of your template instantiations, not just the big-O
notation of run-time. C++ templates have a turing-complete language (which
is said to be quite similar to haskell, but spelled vastly different)
running at compile time in them. This is what gives template
meta-programming in C++ great power, but since templates weren't designed
for this style of programming originally, template meta-programming is not
very easy.

Cheers,
Mark


>
> David
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread David Cournapeau

On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe  wrote:
> On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau 
> wrote:
>>
>> On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris
>>  wrote:
>>
>> >
>> > Well, we already have code obfuscation (DOUBLE_your_pleasure,
>> > FLOAT_your_boat), so we might as well let the compiler handle it.
>>
>> Yes, those are not great, but on the other hand, it is not that a
>> fundamental issue IMO.
>>
>> Iterators as we have it in NumPy is something that is clearly limited
>> by C. Writing the neighborhood iterator is the only case where I
>> really felt that C++ *could* be a significant improvement. I use
>> *could* because writing iterator in C++ is hard, and will be much
>> harder to read (I find both boost and STL - e.g. stlport -- iterators
>> to be close to write-only code). But there is the question on how you
>> can make C++-based iterators available in C. I would be interested in
>> a simple example of how this could be done, ignoring all the other
>> issues (portability, exception, etc…).
>>
>> The STL is also potentially compelling, but that's where we go into my
>> "beware of the dragons" area of C++. Portability loss, compilation
>> time increase and warts are significant there.
>> scipy.sparse.sparsetools has been a source of issues that was quite
>> high compared to its proportion of scipy amount code (we *do* have
>> some hard-won experience on C++-related issues).
>
>
> These standard library issues were definitely valid 10 years ago, but all
> the major C++ compilers have great C++98 support now.

STL varies significantly between platforms, I believe it is still the
case today. Do you know the status of the STL on bluegen, on small
devices ? We unfortunately cannot restrict ourselves to one well known
implementation (e.g. STLPort).

> Is there a specific
> target platform/compiler combination you're thinking of where we can do
> tests on this? I don't believe the compile times are as bad as many people
> suspect, can you give some simple examples of things we might do in NumPy
> you expect to compile slower in C++ vs C?

Switching from gcc to g++ on the same codebase should not change much
compilation times. We should test, but that's not what worries me.
What worries me is when we start using C++ specific code, STL and co.
Today, scipy.sparse.sparsetools takes half of the build time  of the
whole scipy, and it does not even use fancy features. It also takes Gb
of ram when building in parallel.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Stéfan van der Walt

On Feb 19, 2012 12:09 AM, "Mark Wiebe"  wrote:
>
> These standard library issues were definitely valid 10 years ago, but all
the major C++ compilers have great C++98 support now. Is there a specific
target platform/compiler combination you're thinking of where we can do
tests on this? I don't believe the compile times are as bad as many people
suspect, can you give some simple examples of things we might do in NumPy
you expect to compile slower in C++ vs C?

The concern may be more that this will be an issue once we start templating
(scipy.sparse as an example). Compiling templates requires a lot of memory
(more than with the current Heath Robbinson solution).

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Ralf Gommers

On Sun, Feb 19, 2012 at 6:47 AM, Benjamin Root  wrote:

>
> All kidding aside, is your concern that when Mark starts this that no one
> will be able to contribute until he is done? I can tell you right now that
> won't be the case as I will be trying to flesh out issues with datetime64
> with him.
>

If you're interested in that, you may be interested in
https://github.com/numpy/numpy/pull/156. It's about datetime behavior and
compile issues, which are the main reason we can't have a 1.7 release right
now.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread Mark Wiebe

On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau wrote:

> On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris
>  wrote:
>
> >
> > Well, we already have code obfuscation (DOUBLE_your_pleasure,
> > FLOAT_your_boat), so we might as well let the compiler handle it.
>
> Yes, those are not great, but on the other hand, it is not that a
> fundamental issue IMO.
>
> Iterators as we have it in NumPy is something that is clearly limited
> by C. Writing the neighborhood iterator is the only case where I
> really felt that C++ *could* be a significant improvement. I use
> *could* because writing iterator in C++ is hard, and will be much
> harder to read (I find both boost and STL - e.g. stlport -- iterators
> to be close to write-only code). But there is the question on how you
> can make C++-based iterators available in C. I would be interested in
> a simple example of how this could be done, ignoring all the other
> issues (portability, exception, etc…).
>
> The STL is also potentially compelling, but that's where we go into my
> "beware of the dragons" area of C++. Portability loss, compilation
> time increase and warts are significant there.
> scipy.sparse.sparsetools has been a source of issues that was quite
> high compared to its proportion of scipy amount code (we *do* have
> some hard-won experience on C++-related issues).


These standard library issues were definitely valid 10 years ago, but all
the major C++ compilers have great C++98 support now. Is there a specific
target platform/compiler combination you're thinking of where we can do
tests on this? I don't believe the compile times are as bad as many people
suspect, can you give some simple examples of things we might do in NumPy
you expect to compile slower in C++ vs C?

-Mark


> >
> > Jim Hugunin was a keynote speaker at one of the scipy conventions. At
> dinner
> > he said that if he was to do it again he would use managed code ;) I
> don't
> > propose we do that, but tools do advance.
>
> In an ideal world, we would have a better language than C++ that can
> be spit out as C for portability. I have looked for a way to do this
> for as long as I have been contributing to NumPy (I have looked at
> ooc, D, coccinelle at various stages). I believe the best way is
> actually in the vein of FFTW: written in a very high level language
> (OCAML) for the hard part, and spitting out C. This is better than C++
> is many ways - this is also clearly not realistic :)
>
> David
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Matthew Brett

Hi,

On Sat, Feb 18, 2012 at 10:09 PM, Charles R Harris
 wrote:
>
>
> On Sat, Feb 18, 2012 at 9:38 PM, Travis Oliphant 
> wrote:
>>

>> Sure.  This list actually deserves a long writeup about that.   First,
>> there wasn't a "Cython-refactor" of NumPy.   There was a Cython-refactor of
>> SciPy.   I'm not sure of it's current status.   I'm still very supportive of
>> that sort of thing.
>>
>>
>> I think I missed that - is it on git somewhere?
>>
>>
>> I thought so, but I can't find it either.  We should ask Jason McCampbell
>> of Enthought where the code is located.   Here are the distributed eggs:
>>   http://www.enthought.com/repo/.iron/
>
>
> Refactor is with the other numpy repos here.

I think Travis is referring to the _scipy_ refactor here.   I can't
see that with the numpy repos, or with the scipy repos, but I may have
missed it,

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Matthew Brett

Hi,

On Sat, Feb 18, 2012 at 9:47 PM, Benjamin Root  wrote:
>
>
> On Saturday, February 18, 2012, Matthew Brett wrote:
>>
>> Hi,
>>
>> On Sat, Feb 18, 2012 at 8:38 PM, Travis Oliphant 
>> wrote:
>>
>> > We will need to see examples of what Mark is talking about and clarify
>> > some
>> > of the compiler issues.   Certainly there is some risk that once code is
>> > written that it will be tempting to just use it.   Other approaches are
>> > certainly worth exploring in the mean-time, but C++ has some strong
>> > arguments for it.
>>
>> The worry as I understand it is that a C++ rewrite might make the
>> numpy core effectively a read-only project for anyone but Mark.  Do
>> you have any feeling for whether that is likely?
>>
>
> Dude, have you seen the .c files in numpy/core? They are already read-only
> for pretty much everybody but Mark.

I think the question is whether refactoring in C would be preferable
to refactoring in C++.

> All kidding aside, is your concern that when Mark starts this that no one
> will be able to contribute until he is done? I can tell you right now that
> won't be the case as I will be trying to flesh out issues with datetime64
> with him.

No - can I refer you back to the emails from David in particular about
the difficulties of sharing development in C++?  I can find the links
- but do you remember the ones I'm referring to?

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Charles R Harris

On Sat, Feb 18, 2012 at 9:38 PM, Travis Oliphant wrote:

>
> The decision will not be made until NumPy 2.0 work is farther along.
> The most likely outcome is that Mark will develop something quite nice in
> C++ which he is already toying with, and we will either choose to use it in
> NumPy to build 2.0 on --- or not.   I'm interested in sponsoring Mark and
> working as closely as I can with he and Chuck to see what emerges.
>
>
> Would it be fair to say then, that you are expecting the discussion
> about C++ will mainly arise after the Mark has written the code?   I
> can see that it will be easier to specific at that point, but there
> must be a serious risk that it will be too late to seriously consider
> an alternative approach.
>
>
> We will need to see examples of what Mark is talking about and clarify
> some of the compiler issues.   Certainly there is some risk that once code
> is written that it will be tempting to just use it.   Other approaches are
> certainly worth exploring in the mean-time, but C++ has some strong
> arguments for it.
>
>
> Can you say a little more about your impression of the previous Cython
>
> refactor and why it was not successful?
>
>
>
> Sure.  This list actually deserves a long writeup about that.   First,
> there wasn't a "Cython-refactor" of NumPy.   There was a Cython-refactor of
> SciPy.   I'm not sure of it's current status.   I'm still very supportive
> of that sort of thing.
>
>
> I think I missed that - is it on git somewhere?
>
>
> I thought so, but I can't find it either.  We should ask Jason McCampbell
> of Enthought where the code is located.   Here are the distributed eggs:
> http://www.enthought.com/repo/.iron/
>

Refactor is with the other numpy repos
here.


Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Benjamin Root

On Saturday, February 18, 2012, Matthew Brett wrote:

> Hi,
>
> On Sat, Feb 18, 2012 at 8:38 PM, Travis Oliphant 
> >
> wrote:
>
> > We will need to see examples of what Mark is talking about and clarify
> some
> > of the compiler issues.   Certainly there is some risk that once code is
> > written that it will be tempting to just use it.   Other approaches are
> > certainly worth exploring in the mean-time, but C++ has some strong
> > arguments for it.
>
> The worry as I understand it is that a C++ rewrite might make the
> numpy core effectively a read-only project for anyone but Mark.  Do
> you have any feeling for whether that is likely?
>
>
Dude, have you seen the .c files in numpy/core? They are already read-only
for pretty much everybody but Mark.

All kidding aside, is your concern that when Mark starts this that no one
will be able to contribute until he is done? I can tell you right now that
won't be the case as I will be trying to flesh out issues with datetime64
with him.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Matthew Brett

Hi,

On Sat, Feb 18, 2012 at 8:38 PM, Travis Oliphant  wrote:

> We will need to see examples of what Mark is talking about and clarify some
> of the compiler issues.   Certainly there is some risk that once code is
> written that it will be tempting to just use it.   Other approaches are
> certainly worth exploring in the mean-time, but C++ has some strong
> arguments for it.

The worry as I understand it is that a C++ rewrite might make the
numpy core effectively a read-only project for anyone but Mark.  Do
you have any feeling for whether that is likely?

> I thought so, but I can't find it either.  We should ask Jason McCampbell of
> Enthought where the code is located.   Here are the distributed eggs:
>   http://www.enthought.com/repo/.iron/

Should I email him?  Happy to do that.

> From my perspective having a standalone core NumPy is still a goal.   The
> primary advantages of having a NumPy library (call it NumLib for the sake of
> argument) are
>
> 1) Ability for projects like PyPy, IronPython, and Jython to use it more
> easily
> 2) Ability for Ruby, Perl, Node.JS, and other new languages to use the code
> for their technical computing projects.
> 3) increasing the number of users who can help make it more solid
> 4) being able to build the user-base (and corresponding performance with
> eye-balls from Intel, NVidia, AMD, Microsoft, Google, etc. looking at the
> code).
>
> The disadvantages I can think of:
> 1) More users also means we might risk "lowest-commond-denominator" problems
> --- i.e. trying to be too much to too many may make it not useful for
> anyone. Also, more users means more people with opinions that might be
> difficult to re-concile.
> 2) The work of doing the re-write is not small:  probably at least 6
> person-months
> 3) Not being able to rely on Python objects (dictionaries, lists, and tuples
> are currently used in the code-base quite a bit --- though the re-factor did
> show some examples of how to remove this usage).
> 4) Handling of "Object" arrays requires some re-design.

How would numpylib compare to libraries like eigen?  How likely do you
think it would be that unrelated projects would use numpylib rather
than eigen or other numerical libraries?  Do you think the choice of
C++ rather than C will influence whether other projects will take it
up?

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Travis Oliphant

>> 
>> The decision will not be made until NumPy 2.0 work is farther along. The 
>> most likely outcome is that Mark will develop something quite nice in C++ 
>> which he is already toying with, and we will either choose to use it in 
>> NumPy to build 2.0 on --- or not.   I'm interested in sponsoring Mark and 
>> working as closely as I can with he and Chuck to see what emerges.
> 
> Would it be fair to say then, that you are expecting the discussion
> about C++ will mainly arise after the Mark has written the code?   I
> can see that it will be easier to specific at that point, but there
> must be a serious risk that it will be too late to seriously consider
> an alternative approach.

We will need to see examples of what Mark is talking about and clarify some of 
the compiler issues.   Certainly there is some risk that once code is written 
that it will be tempting to just use it.   Other approaches are certainly worth 
exploring in the mean-time, but C++ has some strong arguments for it. 


>>> Can you say a little more about your impression of the previous Cython
>>> refactor and why it was not successful?
>>> 
>> 
>> Sure.  This list actually deserves a long writeup about that.   First, there 
>> wasn't a "Cython-refactor" of NumPy.   There was a Cython-refactor of SciPy. 
>>   I'm not sure of it's current status.   I'm still very supportive of that 
>> sort of thing.
> 
> I think I missed that - is it on git somewhere?

I thought so, but I can't find it either.  We should ask Jason McCampbell of 
Enthought where the code is located.   Here are the distributed eggs:   
http://www.enthought.com/repo/.iron/

-Travis

> 
>> Another factor.   the decision to make an extra layer of indirection makes 
>> small arrays that much slower.   I agree with Mark that in a core library we 
>> need to go the other way with small arrays being completely allocated in the 
>> data-structure itself (reducing the number of pointer de-references
> 
> Does that imply there was a review of the refactor at some point to do
> things like benchmarking?   Are there any sources to get started
> trying to understand the nature of the Numpy refactor and where it ran
> into trouble?  Was it just the small arrays?

The main trouble was just the pace of development of NumPy and the divergence 
of the trees so that the re-factor branch did not keep up.  It's changes were 
quite extensive, and so were some of Mark's.So, that created the difficulty 
in merging them together.   Mark's review of the re-factor was that small-array 
support was going to get worse.   I'm not sure if we ever did any bench-marking 
in that direction. 

> 
>> So, Cython did not play a major role on the NumPy side of things.   It 
>> played a very nice role on the SciPy side of things.
> 
> I guess Cython was attractive because the desire was to make a
> stand-alone library?   If that is still the goal, presumably that
> excludes Cython from serious consideration?  What are the primary
> advantages of making the standalone library?  Are there any serious
> disbenefits?

From my perspective having a standalone core NumPy is still a goal.   The 
primary advantages of having a NumPy library (call it NumLib for the sake of 
argument) are 

1) Ability for projects like PyPy, IronPython, and Jython to use it 
more easily
2) Ability for Ruby, Perl, Node.JS, and other new languages to use the 
code for their technical computing projects.
3) increasing the number of users who can help make it more solid
4) being able to build the user-base (and corresponding performance 
with eye-balls from Intel, NVidia, AMD, Microsoft, Google, etc. looking at the 
code). 

The disadvantages I can think of: 

1) More users also means we might risk "lowest-commond-denominator" 
problems --- i.e. trying to be too much to too many may make it not useful for 
anyone. Also, more users means more people with opinions that might be 
difficult to re-concile. 
2) The work of doing the re-write is not small:  probably at least 6 
person-months
3) Not being able to rely on Python objects (dictionaries, lists, and 
tuples are currently used in the code-base quite a bit --- though the re-factor 
did show some examples of how to remove this usage).
4) Handling of "Object" arrays requires some re-design.

I'm sure there are other factors that could be added to both lists. 

-Travis


> 
> Thanks a lot for the reply,
> 
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread David Warde-Farley

On 2012-02-18, at 2:47 AM, Matthew Brett wrote:

> Of course it might be that so-far undiscovered C++ developers are
> drawn to a C++ rewrite of Numpy.  But it that really likely?

If we can trick them into thinking the GIL doesn't exist, then maybe...

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Matthew Brett

On Sat, Feb 18, 2012 at 5:18 PM, Matthew Brett  wrote:
> Hi,
>
> On Sat, Feb 18, 2012 at 2:54 PM, Travis Oliphant  wrote:
>>
>> On Feb 18, 2012, at 4:03 PM, Matthew Brett wrote:
>>
>>> Hi,
>>>
>>> On Sat, Feb 18, 2012 at 1:57 PM, Travis Oliphant  
>>> wrote:
 The C/C++ discussion is just getting started.  Everyone should keep in mind
 that this is not something that is going to happening quickly.   This will
 be a point of discussion throughout the year.    I'm not a huge supporter 
 of
 C++, but C++11 does look like it's made some nice progress, and as I think
 about making a core-set of NumPy into a library that can be called by
 multiple languages (and even multiple implementations of Python), tempered
 C++ seems like it might be an appropriate way to go.
>>>
>>> Could you say more about this?  Do you have any idea when the decision
>>> about C++ is likely to be made?  At what point does it make most sense
>>> to make the argument for or against?  Can you suggest a good way for
>>> us to be able to make more substantial arguments either way?
>>
>> I think early arguments against are always appropriate --- if you believe 
>> they have a chance of swaying Mark or Chuck who are the strongest supporters 
>> of C++ at this point.     I will be quite nervous about going crazy with 
>> C++.   It was suggested that I use C++ 7 years ago when I wrote NumPy.   I 
>> didn't go that route then largely because of compiler issues,  ABI-concerns, 
>> and I knew C better than C++ so I felt like it would have taken me longer to 
>> do something in C++.     I made the right decision for me.   If you think my 
>> C-code is horrible, you would have been completely offended by whatever C++ 
>> I might have done at the time.
>>
>> But I basically agree with Chuck that there is a lot of C-code in NumPy and 
>> template-based-code that is really trying to be C++ spelled differently.
>>
>> The decision will not be made until NumPy 2.0 work is farther along.     The 
>> most likely outcome is that Mark will develop something quite nice in C++ 
>> which he is already toying with, and we will either choose to use it in 
>> NumPy to build 2.0 on --- or not.   I'm interested in sponsoring Mark and 
>> working as closely as I can with he and Chuck to see what emerges.
>
> Would it be fair to say then, that you are expecting the discussion
> about C++ will mainly arise after the Mark has written the code?   I
> can see that it will be easier to specific at that point, but there
> must be a serious risk that it will be too late to seriously consider
> an alternative approach.
>
>>> Can you say a little more about your impression of the previous Cython
>>> refactor and why it was not successful?
>>>
>>
>> Sure.  This list actually deserves a long writeup about that.   First, there 
>> wasn't a "Cython-refactor" of NumPy.   There was a Cython-refactor of SciPy. 
>>   I'm not sure of it's current status.   I'm still very supportive of that 
>> sort of thing.
>
> I think I missed that - is it on git somewhere?
>
>> I don't know if Cython ever solved the "raising an exception in a 
>> Fortran-called call-back" issue.   I used setjmp and longjmp in several 
>> places in SciPy originally in order to enable exceptions raised in a 
>> Python-callback that is wrapped in a C-function pointer and being handed to 
>> a Fortran-routine that asks for a function-pointer.
>>
>> What happend in NumPy, was that the code was re-factored to become a 
>> library.   I don't think much NumPy code actually ended up in Cython (the 
>> random-number generators have been in Cython from the beginning).
>>
>>
>> The biggest problem with merging the code was that Mark Wiebe got active at 
>> about that same time :-)   He ended up changing several things in the 
>> code-base that made it difficult to merge-in the changes.   Some of the 
>> bug-fixes and memory-leak patches, and tests did get into the code-base, but 
>> the essential creation of the NumPy library did not make it.   There was 
>> some very good work done that I hope we can still take advantage of.
>
>> Another factor.   the decision to make an extra layer of indirection makes 
>> small arrays that much slower.   I agree with Mark that in a core library we 
>> need to go the other way with small arrays being completely allocated in the 
>> data-structure itself (reducing the number of pointer de-references
>
> Does that imply there was a review of the refactor at some point to do
> things like benchmarking?   Are there any sources to get started
> trying to understand the nature of the Numpy refactor and where it ran
> into trouble?  Was it just the small arrays?
>
>> So, Cython did not play a major role on the NumPy side of things.   It 
>> played a very nice role on the SciPy side of things.
>
> I guess Cython was attractive because the desire was to make a

Sorry - that should read "I guess Cython was _not_ attractive ... "

> stand-alone library?   If that is still th

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Matthew Brett

Hi,

On Sat, Feb 18, 2012 at 2:54 PM, Travis Oliphant  wrote:
>
> On Feb 18, 2012, at 4:03 PM, Matthew Brett wrote:
>
>> Hi,
>>
>> On Sat, Feb 18, 2012 at 1:57 PM, Travis Oliphant  wrote:
>>> The C/C++ discussion is just getting started.  Everyone should keep in mind
>>> that this is not something that is going to happening quickly.   This will
>>> be a point of discussion throughout the year.    I'm not a huge supporter of
>>> C++, but C++11 does look like it's made some nice progress, and as I think
>>> about making a core-set of NumPy into a library that can be called by
>>> multiple languages (and even multiple implementations of Python), tempered
>>> C++ seems like it might be an appropriate way to go.
>>
>> Could you say more about this?  Do you have any idea when the decision
>> about C++ is likely to be made?  At what point does it make most sense
>> to make the argument for or against?  Can you suggest a good way for
>> us to be able to make more substantial arguments either way?
>
> I think early arguments against are always appropriate --- if you believe 
> they have a chance of swaying Mark or Chuck who are the strongest supporters 
> of C++ at this point.     I will be quite nervous about going crazy with C++. 
>   It was suggested that I use C++ 7 years ago when I wrote NumPy.   I didn't 
> go that route then largely because of compiler issues,  ABI-concerns, and I 
> knew C better than C++ so I felt like it would have taken me longer to do 
> something in C++.     I made the right decision for me.   If you think my 
> C-code is horrible, you would have been completely offended by whatever C++ I 
> might have done at the time.
>
> But I basically agree with Chuck that there is a lot of C-code in NumPy and 
> template-based-code that is really trying to be C++ spelled differently.
>
> The decision will not be made until NumPy 2.0 work is farther along.     The 
> most likely outcome is that Mark will develop something quite nice in C++ 
> which he is already toying with, and we will either choose to use it in NumPy 
> to build 2.0 on --- or not.   I'm interested in sponsoring Mark and working 
> as closely as I can with he and Chuck to see what emerges.

Would it be fair to say then, that you are expecting the discussion
about C++ will mainly arise after the Mark has written the code?   I
can see that it will be easier to specific at that point, but there
must be a serious risk that it will be too late to seriously consider
an alternative approach.

>> Can you say a little more about your impression of the previous Cython
>> refactor and why it was not successful?
>>
>
> Sure.  This list actually deserves a long writeup about that.   First, there 
> wasn't a "Cython-refactor" of NumPy.   There was a Cython-refactor of SciPy.  
>  I'm not sure of it's current status.   I'm still very supportive of that 
> sort of thing.

I think I missed that - is it on git somewhere?

> I don't know if Cython ever solved the "raising an exception in a 
> Fortran-called call-back" issue.   I used setjmp and longjmp in several 
> places in SciPy originally in order to enable exceptions raised in a 
> Python-callback that is wrapped in a C-function pointer and being handed to a 
> Fortran-routine that asks for a function-pointer.
>
> What happend in NumPy, was that the code was re-factored to become a library. 
>   I don't think much NumPy code actually ended up in Cython (the 
> random-number generators have been in Cython from the beginning).
>
>
> The biggest problem with merging the code was that Mark Wiebe got active at 
> about that same time :-)   He ended up changing several things in the 
> code-base that made it difficult to merge-in the changes.   Some of the 
> bug-fixes and memory-leak patches, and tests did get into the code-base, but 
> the essential creation of the NumPy library did not make it.   There was some 
> very good work done that I hope we can still take advantage of.

> Another factor.   the decision to make an extra layer of indirection makes 
> small arrays that much slower.   I agree with Mark that in a core library we 
> need to go the other way with small arrays being completely allocated in the 
> data-structure itself (reducing the number of pointer de-references

Does that imply there was a review of the refactor at some point to do
things like benchmarking?   Are there any sources to get started
trying to understand the nature of the Numpy refactor and where it ran
into trouble?  Was it just the small arrays?

> So, Cython did not play a major role on the NumPy side of things.   It played 
> a very nice role on the SciPy side of things.

I guess Cython was attractive because the desire was to make a
stand-alone library?   If that is still the goal, presumably that
excludes Cython from serious consideration?  What are the primary
advantages of making the standalone library?  Are there any serious
disbenefits?

Thanks a lot for the reply,

Matthew

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Sturla Molden

Den 19.02.2012 01:12, skrev Nathaniel Smith:
>
> I don't oppose it, but I admit I'm not really clear on what the
> supposed advantages would be. Everyone seems to agree that
>-- Only a carefully-chosen subset of C++ features should be used
>-- But this subset would be pretty useful
> I wonder if anyone is actually thinking of the same subset :-).

Probably not, everybody have their own favourite subset.


>
> Chuck mentioned iterators as one advantage. I don't understand, since
> iterators aren't even a C++ feature, they're just objects with "next"
> and "dereference" operators. The only difference between these is
> spelling:
>for (my_iter i = foo.begin(); i != foo.end(); ++i) { ... }
>for (my_iter i = my_iter_begin(foo); !my_iter_ended(&i);
> my_iter_next(&i)) { ... }
> So I assume he's thinking about something more, but the discussion has
> been too high-level for me to figure out what.

C++11 has this option:

for (auto& item : container) {
 // iterate over the container object,
 // get a reference to each item
 //
 // "container" can be an STL class or
 // A C-style array with known size.
}

Which does this:

for item in container:
 pass


> Using C++ templates to generate ufunc loops is an obvious application,
> but again, in the simple examples

Template metaprogramming?

Don't even think about it. It is brain dead to try to outsmart the compiler.



Sturla




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Charles R Harris

On Sat, Feb 18, 2012 at 5:12 PM, Nathaniel Smith  wrote:

> On Sat, Feb 18, 2012 at 10:54 PM, Travis Oliphant 
> wrote:
> > I'm reading very carefully any arguments against using C++ because I've
> actually pushed back on Mark pretty hard as we've discussed these things
> over the past months.  I am nervous about corner use-cases that will be
> unpleasant for some groups and some platforms.But, that vague
> nervousness is not enough to discount the clear benefits.   I'm curious
> about the state of C++ compilers for Blue-Gene and other big-iron machines
> as well.   My impression is that most of them use g++.   which has pretty
> good support for C++.David and others raised some important concerns
> (merging multiple compilers seems like the biggest issue --- it already
> is...).If someone out there seriously opposes judicious and careful use
> of C++ and can show a clear reason why it would be harmful --- feel free to
> speak up at any time.   We are leaning that way with Mark out in front of
> us leading the charge.
>
> I don't oppose it, but I admit I'm not really clear on what the
> supposed advantages would be. Everyone seems to agree that
>  -- Only a carefully-chosen subset of C++ features should be used
>  -- But this subset would be pretty useful
> I wonder if anyone is actually thinking of the same subset :-).
>
> Chuck mentioned iterators as one advantage. I don't understand, since
> iterators aren't even a C++ feature, they're just objects with "next"
> and "dereference" operators. The only difference between these is
> spelling:
>  for (my_iter i = foo.begin(); i != foo.end(); ++i) { ... }
>  for (my_iter i = my_iter_begin(foo); !my_iter_ended(&i);
> my_iter_next(&i)) { ... }
> So I assume he's thinking about something more, but the discussion has
> been too high-level for me to figure out what.
>

They are classes, data with methods in one cute little bundle.


> Using C++ templates to generate ufunc loops is an obvious application,
> but again, in the simple examples I'm thinking of (e.g., the stuff in
> numpy/core/src/umath/loops.c.src), this pretty much comes down to
> whether we want to spell the function names like "SHORT_add" or
> "add", and write the code like "*(T *))x[0] + ((T *)y)[0]" or
> "((@TYPE@ *)x)[0] + ((@TYPE@ *)y)[0]". Maybe there are other places
> where we'd get some advantage from the compiler knowing what was going
> on, like if we're doing type-based dispatch to overloaded functions,
> but I don't know if that'd be useful for the templates we actually
> use.
>
> RAII is pretty awesome, and RAII smart-pointers might help a lot with
> getting reference-counting right. OTOH, you really only need RAII if
> you're using exceptions; otherwise, the goto-failure pattern usually
> works pretty well, esp. if used systematically.
>
>
That's more like having destructors. Let the compiler do it, part of useful
code abstraction is to hide those sort of sordid details.


> Do we know that the Python memory allocator plays well with the C++
> allocation interfaces on all relevant systems? (Potentially you have
> to know for every pointer whether it was allocated by new, new[],
> malloc, or PyMem_Malloc, because they all have different deallocation
> functions. This is already an issue for malloc versus PyMem_Malloc,
> but C++ makes it worse.)
>
>
I think the low level library will ignore the Python memory allocator, but
there is a template for allocators that makes them selectable.


> Again, it really doesn't matter to me personally which approach is
> chosen. But getting more concrete might be useful...
>
>
Agreed. I think much will be clarified once there is some actual code to
look at.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Nathaniel Smith

On Sat, Feb 18, 2012 at 11:09 PM, David Cournapeau  wrote:
> On Sat, Feb 18, 2012 at 10:50 PM, Sturla Molden  wrote:
>
>>  > In an ideal world, we would have a better language than C++ that can
>> be spit out as > C for portability.
>>
>> What about a statically typed Python? (That is, not Cython.) We just
>> need to make the compiler :-)
>
> There are better languages than C++ that has most of the technical
> benefits stated in this discussion (rust and D being the most
> "obvious" ones), but whose usage is unrealistic today for various
> reasons: knowledge, availability on "esoteric" platforms, etc… A new
> language is completely ridiculous.

Off-topic: rust is an obvious one? That makes my day, Graydon is an
old friend and collaborator :-). But FYI, it wouldn't be relevant
anyway; its emphasis on concurrency means that it can easily call C,
but you can't really call it from C -- it needs to "own" the overall
runtime. And I failed to convince him to add numerical-array-relevant
features like operator overloading to make it more convenient for
numerical programmers attracted by the concurrency support :-(.

There are some very small values of "new language" that might be
relevant alternatives, like -- if templates are the big draw for C++,
then making the existing code generators suck less might do just as
well, while avoiding the build system and portability hassles of C++.
*shrug*

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Nathaniel Smith

On Sat, Feb 18, 2012 at 10:54 PM, Travis Oliphant  wrote:
> I'm reading very carefully any arguments against using C++ because I've 
> actually pushed back on Mark pretty hard as we've discussed these things over 
> the past months.  I am nervous about corner use-cases that will be unpleasant 
> for some groups and some platforms.    But, that vague nervousness is not 
> enough to discount the clear benefits.   I'm curious about the state of C++ 
> compilers for Blue-Gene and other big-iron machines as well.   My impression 
> is that most of them use g++.   which has pretty good support for C++.    
> David and others raised some important concerns (merging multiple compilers 
> seems like the biggest issue --- it already is...).    If someone out there 
> seriously opposes judicious and careful use of C++ and can show a clear 
> reason why it would be harmful --- feel free to speak up at any time.   We 
> are leaning that way with Mark out in front of us leading the charge.

I don't oppose it, but I admit I'm not really clear on what the
supposed advantages would be. Everyone seems to agree that
  -- Only a carefully-chosen subset of C++ features should be used
  -- But this subset would be pretty useful
I wonder if anyone is actually thinking of the same subset :-).

Chuck mentioned iterators as one advantage. I don't understand, since
iterators aren't even a C++ feature, they're just objects with "next"
and "dereference" operators. The only difference between these is
spelling:
  for (my_iter i = foo.begin(); i != foo.end(); ++i) { ... }
  for (my_iter i = my_iter_begin(foo); !my_iter_ended(&i);
my_iter_next(&i)) { ... }
So I assume he's thinking about something more, but the discussion has
been too high-level for me to figure out what.

Using C++ templates to generate ufunc loops is an obvious application,
but again, in the simple examples I'm thinking of (e.g., the stuff in
numpy/core/src/umath/loops.c.src), this pretty much comes down to
whether we want to spell the function names like "SHORT_add" or
"add", and write the code like "*(T *))x[0] + ((T *)y)[0]" or
"((@TYPE@ *)x)[0] + ((@TYPE@ *)y)[0]". Maybe there are other places
where we'd get some advantage from the compiler knowing what was going
on, like if we're doing type-based dispatch to overloaded functions,
but I don't know if that'd be useful for the templates we actually
use.

RAII is pretty awesome, and RAII smart-pointers might help a lot with
getting reference-counting right. OTOH, you really only need RAII if
you're using exceptions; otherwise, the goto-failure pattern usually
works pretty well, esp. if used systematically.

Do we know that the Python memory allocator plays well with the C++
allocation interfaces on all relevant systems? (Potentially you have
to know for every pointer whether it was allocated by new, new[],
malloc, or PyMem_Malloc, because they all have different deallocation
functions. This is already an issue for malloc versus PyMem_Malloc,
but C++ makes it worse.)

Again, it really doesn't matter to me personally which approach is
chosen. But getting more concrete might be useful...

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Sturla Molden

Den 18.02.2012 23:54, skrev Travis Oliphant:
> Another factor.   the decision to make an extra layer of indirection makes 
> small arrays that much slower.   I agree with Mark that in a core library we 
> need to go the other way with small arrays being completely allocated in the 
> data-structure itself (reducing the number of pointer de-references).
>

I am not sure there is much overhead to

double *const data = (double*)PyArray_DATA(array);

If C code calls PyArray_DATA(array) more than needed, the fix is not to 
store the data inside the struct, but rather fix the real problem. For 
example, the Cython syntax for NumPy arrays will under the hood unbox 
the ndarray struct into local variables. That gives the fastest data 
access. The NumPy core could e.g. have macros that takes care of the 
unboxing.

But for the purpose of cache use, it could be smart to make sure the 
data buffer is allocated directly after the PyObject struct (or at least 
in vicinity of it), so it will be loaded into cache along with the 
PyObject. That is, prefetched before dereferencing PyArray_DATA(array). 
But with respect to placement we must keep in mind the the PyObject can 
be subclassed. Putting e.g. 4 kb of static buffer space inside the 
PyArrayObject struct will bloat every ndarray.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Charles R Harris

On Sat, Feb 18, 2012 at 3:24 PM, David Cournapeau wrote:

> On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris
>  wrote:
>
> >
> > Well, we already have code obfuscation (DOUBLE_your_pleasure,
> > FLOAT_your_boat), so we might as well let the compiler handle it.
>
> Yes, those are not great, but on the other hand, it is not that a
> fundamental issue IMO.
>

"Name mangling" is what I meant. But C++ does exactly the same thing, just
more systematically. It's not whether it's great, it's whether the compiler
or the programmer does the boring stuff.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Sturla Molden

Den 19.02.2012 00:33, skrev Sturla Molden:
> Or just write everything in Cython, even the core?

That is, use memory view syntax and fused types for generics, and hope 
it is stable before we are done ;-)

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Sturla Molden

Den 19.02.2012 00:09, skrev David Cournapeau:
> There are better languages than C++ that has most of the technical 
> benefits stated in this discussion (rust and D being the most 
> "obvious" ones),

What about Java? (compile with GJC for CPython)

Or just write everything in Cython, even the core?

Sturla





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Sturla Molden

Den 19.02.2012 00:09, skrev David Cournapeau:
> reasons: knowledge, availability on "esoteric" platforms, etc… A new
> language is completely ridiculous.

Yes, that is why I argued against Cython as well. Personally I prefer 
C++ to C, but only if it is written in a readable way. And if the 
purpose is to write C in C++, then it's brain dead.

Sturla







___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread David Cournapeau

On Sat, Feb 18, 2012 at 10:50 PM, Sturla Molden  wrote:

>  > In an ideal world, we would have a better language than C++ that can
> be spit out as > C for portability.
>
> What about a statically typed Python? (That is, not Cython.) We just
> need to make the compiler :-)

There are better languages than C++ that has most of the technical
benefits stated in this discussion (rust and D being the most
"obvious" ones), but whose usage is unrealistic today for various
reasons: knowledge, availability on "esoteric" platforms, etc… A new
language is completely ridiculous.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread Matthew Brett

Hi,

On Sat, Feb 18, 2012 at 2:51 PM, Robert Kern  wrote:
> On Sat, Feb 18, 2012 at 22:29, Matthew Brett  wrote:
>> Hi,
>>
>> On Sat, Feb 18, 2012 at 2:20 PM, Robert Kern  wrote:
>>> On Sat, Feb 18, 2012 at 22:06, Matthew Brett  
>>> wrote:
 Hi,

 On Sat, Feb 18, 2012 at 2:03 PM, Robert Kern  wrote:
>
> Your misunderstanding of what was being discussed. The proposal being
> discussed is implementing the core of numpy in C++, wrapped in C to be
> usable as a C library that other extensions can use, and then exposed
> to Python in an unspecified way. Cython was raised as an alternative
> for this core, but as Chuck points out, it doesn't really fit. Your
> assertion that what was being discussed was putting the core in C and
> using Cython to wrap it was simply a non-sequitur. Discussion of
> alternatives is fine. You weren't doing that.

 You read David's email?  Was he also being annoying?
>>>
>>> Not really, because he was responding on-topic to the bizarro-branch
>>> of the conversation that you spawned about the merits of moving from
>>> hand-written C extensions to a Cython-wrapped C library. Whatever
>>> annoyance his email might inspire is your fault, not his. The
>>> discussion was about whether to use C++ or Cython for the core. Chuck
>>> argued that Cython was not a suitable implementation language for the
>>> core. You responded that his objections to Cython didn't apply to what
>>> you thought was being discussed, using Cython to wrap a pure-C
>>> library. As Pauli (Wolfgang, not our Pauli) once phrased it, you were
>>> "not even wrong". It's hard to respond coherently to someone who is
>>> breaking the fundamental expectations of discourse. Even I had to
>>> stare at the thread for a few minutes to figure out where things went
>>> off the rails.
>>
>> I'm sorry but this seems to me to be aggressive, offensive, and unjust.
>>
>> The discussion was, from the beginning, mainly about the relative
>> benefits of rewriting the core with C / Cython, or C++.
>>
>> I don't think anyone was proposing writing every line of the numpy
>> core in Cython.  Ergo (sorry to use the debating term), the proposal
>> to use Cython was always to take some of the higher level code out of
>> C and leave some of it in C.   It does indeed make the debate
>> ridiculous to oppose a proposal that no-one has made.
>>
>> Now I am sure it is obvious to you, that the proposal to refactor the
>> current C code to into low-level C libraries, and higher level Cython
>> wrappers, is absurd and off the table.  It isn't obvious to me.  I
>> don't think I broke a fundamental rule of polite discourse to clarify
>> that is what I meant,
>
> It's not off the table, but it's not what this discussion was about.

I beg to differ - which was why I replied the way I did.  As I see it
the two proposals being discussed were:

1) C++ rewrite of C core
2) Refactor current C core into C / Cython

I think you can see from David's reply that that was also his
understanding.  Of course you could use Cython to interface to the
'core' in C or the 'core' in C++, but the difference would be, that
some of the stuff in C++ for option 1) would be in Cython, in option
2).

Now you might be saying, that you believe the discussion was only ever
about whether the non-Cython bits would be in C or C++.  That would
indeed make sense of your lack of interest in discussion of Cython.  I
think you'd be hard pressed to claim it was only me discussing Cython
though.

Chuck was pointing out that it was completely ridiculous trying to
implement the entire core in Cython.  Yes it is.  As no-one has
proposed that, it seems to me only reasonable to point out what I
meant, in the interests of productive discourse.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

1 2 >

1 - 100 of 193 matches

Mail list logo