On Thu, 28 May 2009 13:36:28 -0400, Denis Koroskin <2kor...@gmail.com> wrote:

On Thu, 28 May 2009 21:07:57 +0400, Robert Jacques <sandf...@jhu.edu> wrote:

On Thu, 28 May 2009 12:45:41 -0400, Denis Koroskin <2kor...@gmail.com>
wrote:

On Thu, 28 May 2009 20:32:29 +0400, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:

BCS wrote:
Everything is indicating that shared memory multi-threading is where
it's all going.

That is correct, just that it's 40 years late. Right now everything is
indicating that things are moving *away* from shared memory.

Andrei

That's true.

For example, we develop for PS3, and its 7 SPU cores have 256KiB of TLS
each (which is as fast as L2 cache) and no direct shared memory access.
Shared memory needs to be requested via asynchronous memcpy requests,
and this scheme doesn't work with OOP well: even after you transfer
some object, its vtbl etc still point to shared memory.

We had hard time re-arranging our data so that object and everything it
owns (and points to) is stored sequencially in a single large block of
memory.
This also resulted in replacing most of the pointers with relative
offsets.

Parallelization is hard, but the result is worth the trouble.

I agree that Andrei's right, but your example is wrong. The Cell's SPU
are a SIMD vector processors, not general CPUs. I also work with vector
processors (NVIDIA's CUDA) but every software/hardware iteration gets
further and further away from pure vector processing. Rumor has it that
the NVIDIA's next chip will be MIMD, instead of SIMD.

I wanted to stress that multicore PUs tent to have their own local memory (small but fast) and little or none global (shared) memory access (it is not efficient and error prone - race condition et al.)

I believe SIMD/MIMD discussion is irrelevant here. It's all about Shared/Distributed Memory Model. MIMD devices can be both (http://en.wikipedia.org/wiki/MIMD)

Well, I thought you were making a different point. Really, the Cell SPU is the only current PU with the design you're talking about. All commercial CPUs and GPUs have very large global memory buses. Every blog and talk I've read/attended has painted the SPU in a very negative light, at least with regard to the programming model. (Which makes sense, since it's sorta like non-cache coherent NUMA, which pretty much all everyone decided is a bad idea.)

Reply via email to