Stefan Behnel wrote:
> Hi Dag,
> 
> thanks for writing this up (and for insisting :).

It doesn't seem like I had much choice -- the main, concentrated period 
I have for Cython this summer, where I can work on totally new stuff, is 
about a week away, and ideally me and Kurt would have gone through some 
implementation details and work sharing by then. If a decision can't be 
reached very soon I fear I'll have to drop it anyway, making the 
discussion kind of moot.

> Every addition that requires major syntax support will almost always invoke
> a discussion, as seen here. IMHO, getting a feature without syntax impact
> is a lot better than yet another way of overloading and extending some
> bracket syntax.
> 
> Regarding CEP 517, I think that a 'native', memory managed array type would
> be very helpful in a fair lot of use cases. Plus, (most of) this can be
> done in an external module. So that's a perfect new feature. Solving ticket
> 152 (PyVarObject support) first would be helpful to this end.

...and adding support for creating custom PyVarObjects in Cython through 
special syntax perhaps.


> Regarding CEP 518, I do see the interest, and I do see why this cannot be
> done externally. But I also see that there were doubts that this is enough
> to be truly useful except for a number of cases. And it has some impact on
> the language.

It is useful in a lot of numerical cases. I just benchmarked: Doing 
sqrt(x*x+y*y) with a CEP 518 approach is 3 times faster on my machine 
than NumPy (using only whatever gcc does to my naive loop and a single 
thread).

It's the chicken and egg problem again: Currently Python/NumPy is seen 
as a MATLAB replacement, with this Python/NumPy could start to compete 
with FORTRAN (given some years of interest in Cython from the numerical 
community). These CEPs IMO scetch pretty much everything that make 
people still cling to FORTRAN, feature-wise.

Current standard practice is to write the heavy stuff in FORTRAN or C++ 
and then wrap it for use from Python; that's an entirely new "user 
segment" for whom Cython is currently not so much of an option.

> We had a discussion about parallelism in the language a while ago, which
> lead to this write-up:
> 
> http://wiki.cython.org/enhancements/parallel
> 
> Nothing happened since then, as ticket 211 shows.

Thread parallelism is IMO orthogonal. Even serial implementations of CEP 
518 would be very useful because of reduced memory bus traffic (re above 
3x speed increase). In some situations parallell implementations may not 
even help (if too little happen on the CPU vs. memory bandwidth); thus 
CEP 518 in serial is needed even to get to the point where memory 
bandwidth is offloaded.

In my own work I'd use pretty coarse-grained MPI and run the program on 
100 cluster nodes at the same time (each with only 4 cores); shared 
memory parallellism in the language thus buys me personally absolutely 
nothing. (Here each numerical user is different though, but I've been 
the one involved most heavily in Cython.)

> I like the idea of parallelism in the language in general, and I'm not
> opposed to CEP 518. I'd just like to see it added in a way that clearly
> separates it from the 'normal' array type.

Robert proposed that int[:,:] would be just that special numerics type, 
and that [int] or int[] would be a normal array type (perhaps even 
[[int]] for 2D). The int[:,:] notation has a lot of possibilities for 
embedding information about how each dimension is stored which is 
primarily useful in numerics -- specifying "int[:,::1]" so that it is 
known compile-time that the second dimension is contiguous can double 
the speed in an in-cache numerical situation.

Really, I think this is much about carving out a little room in the 
syntax where the significant numerical Cython community can play by 
themselves. int[:,:] doesn't seem to be something we'd have to tell 
non-numerics people about :-); and all the groundwork towards [int] 
would be done as a side-effect.

As Lisandro mentioned, for a numerical user, SIMD is the "normal" way of 
treating an array in practically every possible alternative language 
(and the reason C is out of the question for numerics).

> 
> What would be better for the intended use case: allow parallelism in the
> language in general (i.e. parallel loops), or have a parallel type that
> does some magic behind the scenes?

Assuming the intended use case is numerics, the latter one is pretty 
much the only viable option IMO. If one have to make a choice, but the 
former can be nice in some other situations.

As Kurt touched upon though, I don't think we can compete with 40 years 
of language development in the custom numerics languages. So I'm 
actually reluctant to have the discussion, and would hope to keep the 
discussion to whether we can "make a little corner in the syntax where 
the numerics people can play alone" and avoid any bikeshedding.

-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to