Re: Thoughts on parallel programming?
Distributed programming is essentially a bunch of little sequential program that interact, which is basically how people cooperate in the real world. I think that is by far the most intuitive of any concurrent programming model, though it's still a significant conceptual shift from the traditional monolithic imperative program. The Erlang people seem to say that a lot. The thing they omit to say, though, is that it is very, very difficult in the real world! Consider managing a team of ten people. Getting them to be ten times as productive as a single person is extremely difficult -- virtually impossible, in fact. That's only part of the reasoning behind all of the little programs in Erlang. The one of the more important aspect is the concept of supervisor trees where you have processes that monitor* other processes. In the event that a child process fails, the parent process will try to perform a simpler version of what needs to occur until it is successful. The other aspect is the concept of failing fast. It is assumed that a process that fails does not know how to resolve the issue, therefore it should just stop running and allow the parent process to do the right thing. If you build your software the Erlang way, then you implicitly build software that is multi-core friendly. How well it uses multiple cores depends on the software that is written, however I believe that Erlang is supposed to be better than most other languages at obtaining something close to linear scaling across cores. Not 100% sure, though. Does this mean that I believe distributed programming is easy in Erlang? Well, that depends on what you're doing, but I will say that being able to spawn functions on different machines is dirt simple. Doing it efficiently...well, that's where I think the programmer needs to know what they're doing. Casey * The monitoring is something implicit to the language.
Re: Thoughts on parallel programming?
True enough. But it's certainly more natural to think about than mutex-based concurrency, automatic parallelization, etc. In the long term there may turn out to be better models, but I don't know of one today. Also, there are other goals for such a design than increasing computation speed: decreased maintenance cost, system reliability, etc. Erlang processes are equivalent to objects in C++ or Java with the added benefit of asynchronous execution in instances where an immediate response (ie. RPC) is not required. Performance gain is a direct function of how often this is true. But even where it's not, the other benefits exist. I like that description! Casey
Re: Thoughts on parallel programming?
On 11/12/2010 12:44 AM, dsimcha wrote: == Quote from Tobias Pfaff (nos...@spam.no)'s article On 11/11/2010 08:10 PM, Russel Winder wrote: On Thu, 2010-11-11 at 18:24 +0100, Tobias Pfaff wrote: [ . . . ] Unfortunately I only know about the standard stuff, OpenMP/OpenCL... Speaking of which: Are there any attempts to support lightweight multithreading in D, that is, something like OpenMP ? I'd hardly call OpenMP lightweight. I agree that as a meta-notation for directing the compiler how to insert appropriate code to force multithreading of certain classes of code, using OpenMP generally beats manual coding of the threads. But OpenMP is very Fortran oriented even though it can be useful for C, and indeed C++ as well. However, given things like Threading Building Blocks (TBB) and the functional programming inspired techniques used by Chapel, OpenMP increasingly looks like a hack rather than a solution. Using parallel versions of for, map, filter, reduce in the language is probably a better way forward. Having a D binding to OpenCL (and OpenGL, MPI, etc.) is probably going to be a good thing. Well, I am looking for an easy efficient way to perform parallel numerical calculations on our 4-8 core machines. With C++, that's OpenMP (or GPGPU stuff using CUDA/OpenCL) for us now. Maybe lightweight was the wrong word, what I meant is that OpenMP is easy to use, and efficient for the problems we are solving. There actually might be better tools for that, honestly we didn't look into that much options -- we are no HPC guys, 1000-cpu clusters are not a relevant scenario and we are happy that we even started parallelizing our code at all :) Anyways, I was thinking about the logical thing to use in D for this scenario. It's nothing super-fancy, in cases just a parallel_for we will, and sometimes a map/reduce operation... Cheers, Tobias I think you'll be very pleased with std.parallelism when/if it gets into Phobos. The design philosophy is exactly what you're looking for: Simple shared memory parallelism on multicore computers, assuming no fancy/unusual OS-, compiler- or hardware-level infrastructure. Basically, it's got parallel foreach, parallel map, parallel reduce and parallel tasks. All you need to fully utilize it is DMD and a multicore PC. As a reminder, the docs are at http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html and the code is at http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/std_parallelism.d . If this doesn't meet your needs in its current form, I'd like as much constructive criticism as possible, as long as it's within the scope of simple, everyday parallelism without fancy infrastructure. I did a quick test of the module, looks really good so far, thanks for providing this ! (Is this module scheduled for inclusion in phobos2 ?) If I find issues with it I'll let you know.
Re: Thoughts on parallel programming?
On 12-nov-10, at 00:29, Tobias Pfaff wrote: [...] Well, I am looking for an easy efficient way to perform parallel numerical calculations on our 4-8 core machines. With C++, that's OpenMP (or GPGPU stuff using CUDA/OpenCL) for us now. Maybe lightweight was the wrong word, what I meant is that OpenMP is easy to use, and efficient for the problems we are solving. There actually might be better tools for that, honestly we didn't look into that much options -- we are no HPC guys, 1000-cpu clusters are not a relevant scenario and we are happy that we even started parallelizing our code at all :) Anyways, I was thinking about the logical thing to use in D for this scenario. It's nothing super-fancy, in cases just a parallel_for we will, and sometimes a map/reduce operation... If you use D1 blip.parallel.smp offers that, and it does scale well to 4-8 cores.
Re: Thoughts on parallel programming?
On 11-nov-10, at 20:41, Russel Winder wrote: On Thu, 2010-11-11 at 15:16 +0100, Fawzi Mohamed wrote: [ . . . ] on this I am not so sure, heterogeneous clusters are more difficult to program, and GPU co are slowly becoming more and more general purpose. Being able to take advantage of those is useful, but I am not convinced they are necessarily the future. The Intel roadmap is for processor chips that have a number of cores with different architectures. Heterogeneity is not going going to be a choice, it is going to be an imposition. And this is at bus level, not at cluster level. Vector co processors, yes I see that, and short term the effect of things like AMD fusion (CPU/GPU merging). Is this necessarily the future? I don't know, neither does intel I think, as they are still evaluating larabee. But CPU/GPU will stay around fro some time more for sure. [ . . . ] yes many core is the future I agree on this, and also that distributed approach is the only way to scale to a really large number of processors. Bud distributed systems *are* more complex, so I think that for the foreseeable future one will have a hybrid approach. Hybrid is what I am saying is the future whether we like it or not. SMP as the whole system is the past. I disagree that distributed systems are more complex per se. I suspect comments are getting so general here that anything anyone writes can be seen as both true and false simultaneously. My perception is that shared memory multithreading is less and less a tool that applications programmers should be thinking in terms of. Multiple processes with an hierarchy of communications costs is the overarching architecture with each process potentially being SMP or CSP or . . . I agree that on not too large shared memory machines a hierarchy of tasks is the correct approach. This is what I did in blip.parallel.smp. Using that one can have fairly efficient automatic scheduling, and so forget most of the complexities, and actual hardware configuration. again not sure the situation is as dire as you paint it, Linux does quite well in the HPC field... but I agree that to be the ideal OS for these architectures it will need more changes. The Linux driver architecture is already creaking at the seams, it implies a central monolithic approach to operating system. This falls down in a multiprocessor shared memory context. The fact that the Top 500 generally use Linux is because it is the least worst option. M$ despite throwing large amounts of money at the problem, and indeed bought some very high profile names to try and do something about the lack of traction, have failed to make any headway in the HPC operating system stakes. Do you want to have to run a virus checker on your HPC system? My gut reaction is that we are going to see a rise of hypervisors as per Tilera chips, at least in the short to medium term, simply as a bridge from the now OSes to the future. My guess is that L4 microkernels and/or nanokernels, exokernels, etc. will find a central place in future systems. The problem to be solved is ensuring that the appropriate ABI is available on the appropriate core at the appropriate time. Mobility of ABI is the critical factor here. yes microkernels co will be more and more important (but I wonder how much this will be the case for the desktop). ABI mobility?not so sure, for hpc I can imagine having to compile to different ABIs (but maybe that is what you mean with ABI mobility) [ . . . ] Whole array operation are useful, and when possible one gains much using them, unfortunately not all problems can be reduced to few large array operations, data parallel languages are not the main type of language for these reasons. Agreed. My point was that in 1960s code people explicitly handled array operations using do loops because they had to. Nowadays such code is anathema to efficient execution. My complaint here is that people have put effort into compiler technology instead of rewriting the codes in a better language and/or idiom. Clearly whole array operations only apply to algorithms that involve arrays! [ . . . ] well whole array operations are a generalization of the SPMD approach, so I this sense you said that that kind of approach will have a future (but with a more difficult optimization as the hardware is more complex. I guess this is where the PGAS people are challenging things. Applications can be couched in terms of array algorithms which can be scattered across distributed memory systems. Inappropriate operations lead to huge inefficiencies, but handles correctly, code runs very fast. About MPI I think that many don't see what MPI really does, mpi offers a simplified parallel model. The main weakness of this model is that it assumes some kind of reliability, but then it offers a clear computational model with processors ordered in a linear of higher
Re: Thoughts on parallel programming?
On 11-nov-10, at 20:10, Russel Winder wrote: On Thu, 2010-11-11 at 18:24 +0100, Tobias Pfaff wrote: [ . . . ] Unfortunately I only know about the standard stuff, OpenMP/OpenCL... Speaking of which: Are there any attempts to support lightweight multithreading in D, that is, something like OpenMP ? I'd hardly call OpenMP lightweight. I agree that as a meta-notation for directing the compiler how to insert appropriate code to force multithreading of certain classes of code, using OpenMP generally beats manual coding of the threads. But OpenMP is very Fortran oriented even though it can be useful for C, and indeed C++ as well. However, given things like Threading Building Blocks (TBB) and the functional programming inspired techniques used by Chapel, OpenMP increasingly looks like a hack rather than a solution. I agree I think that TBB offers primitives for many parallelization kinds, and is more clean and flexible than OpenMP, but in my opinion it has a big weakness: it cannot cope well with independent tasks. Coping well wit both nested parallelism and independent tasks is a crucial thing to have a generic solution that can be applied to several problems. This is missing as far as I know also from Chapel. I think that having a solution that copes well with both nested parallelism and independent tasks is an excellent starting on which to build almost all other higher level parallelization schemes. It is important to handle this centrally, because the number of threads that one should spawn should ideally stay limited to the number of execution units.
Re: Thoughts on parallel programming?
Sean Kelly wrote: Walter Bright Wrote: Russel Winder wrote: At the heart of all this is that programmers are taught that algorithm is a sequence of actions to achieve a goal. Programmers are trained to think sequentially and this affects their coding. This means that parallelism has to be expressed at a sufficiently high level that programmers can still reason about algorithms as sequential things. I think it's more than being trained to think sequentially. I think it is in the inherent nature of how we think. Distributed programming is essentially a bunch of little sequential program that interact, which is basically how people cooperate in the real world. I think that is by far the most intuitive of any concurrent programming model, though it's still a significant conceptual shift from the traditional monolithic imperative program. The Erlang people seem to say that a lot. The thing they omit to say, though, is that it is very, very difficult in the real world! Consider managing a team of ten people. Getting them to be ten times as productive as a single person is extremely difficult -- virtually impossible, in fact. I agree with Walter -- I don't think it's got much to do with programmer training. It's a problem that hasn't been solved in the real world in the general case. The analogy with the real world suggests to me that there are three cases that work well: * massively parallel; * _completely_ independent tasks; and * very small teams. Large teams are a management nightmare, and I see no reason to believe that wouldn't hold true for a large number of cores as well.
Re: Thoughts on parallel programming?
On Thu, 2010-11-11 at 02:24 +, jfd wrote: Any thoughts on parallel programming. I was looking at something about Chapel and X10 languages etc. for parallelism, and it looks interesting. I know that it is still an area of active research, and it is not yet (far from?) done, but anyone have thoughts on this as future direction? Thank you. Any programming language that cannot be used to program applications running on a heterogeneous collection of processors, including CPUs and GPUs as computational devices, on a single chip, with there being many such chips on a board, possibly clustered, doesn't have much of a future. Timescale 5--10 years. Intel's 80-core, 48-core and 50-core devices show the way server, workstation and laptop architectures are going. There may be a large central memory unit as now, but it will be secondary storage not primary storage. All the chip architectures are shifting to distributed memory -- basically cache coherence is too hard a problem to solve, so instead of solving it, they are getting rid of it. Also the memory bus stops being the bottleneck for computations, which is actually the biggest problem with current architectures. Windows, Linux and Mac OS X have a serious problem and will either die or be revolutionized. Apple at least recognize the issue, hence they pushed OpenCL. Actor model, CSP, dataflow, and similar distributed memory/process-based architectures will become increasingly important for software. There will be an increasing move to declarative expression, but I doubt functional languages will ever make the main stream. The issue here is that parallelism generally requires programmers not to try and tell the computer every detail how to do something, but instead specify the start and end conditions and allow the runtime system to handle the realization of the transformation. Hence the move in Fortran from lots of do loops to whole array operations. MPI and all the SPMD approaches have a severely limited future, but I bet the HPC codes are still using Fortran and MPI in 50 years time. You mentioned Chapel and X10, but don't forget the other one of the original three HPCS projects, Fortress. Whilst all three are PGAS (partitioned global address space) languages, Fortress takes a very different viewpoint compared to Chapel and X10. The summary of the summary is: programmers will either be developing parallelism systems or they will be unemployed. shameless-plug To hear more, I am doing a session on all this stuff for ACCU London 2010-11-18 18:30+00:00 http://skillsmatter.com/event/java-jee/java-python-ruby-linux-windows-are-all-doomed /shameless-plug -- Russel. = Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Roadm: +44 7770 465 077 xmpp: rus...@russel.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder signature.asc Description: This is a digitally signed message part
Re: Thoughts on parallel programming?
On 11-nov-10, at 09:58, Russel Winder wrote: On Thu, 2010-11-11 at 02:24 +, jfd wrote: Any thoughts on parallel programming. I was looking at something about Chapel and X10 languages etc. for parallelism, and it looks interesting. I know that it is still an area of active research, and it is not yet (far from?) done, but anyone have thoughts on this as future direction? Thank you. Any programming language that cannot be used to program applications running on a heterogeneous collection of processors, including CPUs and GPUs as computational devices, on a single chip, with there being many such chips on a board, possibly clustered, doesn't have much of a future. Timescale 5--10 years. on this I am not so sure, heterogeneous clusters are more difficult to program, and GPU co are slowly becoming more and more general purpose. Being able to take advantage of those is useful, but I am not convinced they are necessarily the future. Intel's 80-core, 48-core and 50-core devices show the way server, workstation and laptop architectures are going. There may be a large central memory unit as now, but it will be secondary storage not primary storage. All the chip architectures are shifting to distributed memory -- basically cache coherence is too hard a problem to solve, so instead of solving it, they are getting rid of it. Also the memory bus stops being the bottleneck for computations, which is actually the biggest problem with current architectures. yes many core is the future I agree on this, and also that distributed approach is the only way to scale to a really large number of processors. Bud distributed systems *are* more complex, so I think that for the foreseeable future one will have a hybrid approach. Windows, Linux and Mac OS X have a serious problem and will either die or be revolutionized. Apple at least recognize the issue, hence they pushed OpenCL. again not sure the situation is as dire as you paint it, Linux does quite well in the HPC field... but I agree that to be the ideal OS for these architectures it will need more changes. Actor model, CSP, dataflow, and similar distributed memory/process- based architectures will become increasingly important for software. There will be an increasing move to declarative expression, but I doubt functional languages will ever make the main stream. The issue here is that parallelism generally requires programmers not to try and tell the computer every detail how to do something, but instead specify the start and end conditions and allow the runtime system to handle the realization of the transformation. Hence the move in Fortran from lots of do loops to whole array operations. Whole array operation are useful, and when possible one gains much using them, unfortunately not all problems can be reduced to few large array operations, data parallel languages are not the main type of language for these reasons. MPI and all the SPMD approaches have a severely limited future, but I bet the HPC codes are still using Fortran and MPI in 50 years time. well whole array operations are a generalization of the SPMD approach, so I this sense you said that that kind of approach will have a future (but with a more difficult optimization as the hardware is more complex. About MPI I think that many don't see what MPI really does, mpi offers a simplified parallel model. The main weakness of this model is that it assumes some kind of reliability, but then it offers a clear computational model with processors ordered in a linear of higher dimensional structure and efficient collective communication primitives. Yes MPI is not the right choice for all problems, but when usable it is very powerful, often superior to the alternatives, and programming with it is *simpler* than thinking about a generic distributed system. So I think that for problems that are not trivially parallel, or easily parallelizable MPI will remain as the best choice. You mentioned Chapel and X10, but don't forget the other one of the original three HPCS projects, Fortress. Whilst all three are PGAS (partitioned global address space) languages, Fortress takes a very different viewpoint compared to Chapel and X10. It might be a personal thing, but I am kind of suspicious toward PGAS, I find a generalized MPI model better than PGAS when you want to have separated address spaces. Using MPI one can define a PGAS like object wrapping local storage with an object that sends remote requests to access remote memory pieces. This means having a local server where this wrapped objects can be published and that can respond in any moment to external requests. I call this rpc (remote procedure call) and it can be realized easily on the top of MPI. As not all objects are distributed and in a complex program it does not always makes sense to distribute these objects on all processors or none, I find
Re: Thoughts on parallel programming?
On 11-nov-10, at 15:16, Fawzi Mohamed wrote: On 11-nov-10, at 09:58, Russel Winder wrote: On Thu, 2010-11-11 at 02:24 +, jfd wrote: Any thoughts on parallel programming. I was looking at something about Chapel and X10 languages etc. for parallelism, and it looks interesting. I know that it is still an area of active research, and it is not yet (far from?) done, but anyone have thoughts on this as future direction? Thank you. I just finished reading Parallel Programmability and the Chapel Language by Chamberlain, Callahan and Zima. A very nice read, and overview of several languages and approaches. Still I stand by my earlier view, an MPI like approach is more flexible, but indeed having a nice parallel implementation of distributed arrays (which on MPI one can have using Global Arrays for example), can be very useful. I think that a language like D can hide these behind wrapper objects, and reach for these objects (that are not the only ones present in a complex parallel program) an expressivity similar to chapel using the approach I have in blip. A direct implementation might be more efficient on shared memory machines though.
Re: Thoughts on parallel programming?
On 11-nov-10, at 15:16, Fawzi Mohamed wrote: On 11-nov-10, at 09:58, Russel Winder wrote: MPI and all the SPMD approaches have a severely limited future, but I bet the HPC codes are still using Fortran and MPI in 50 years time. well whole array operations are a generalization of the SPMD approach, so I this sense you said that that kind of approach will have a future (but with a more difficult optimization as the hardware is more complex. sorry I translated that as SIMD, not SPMD, but the answer below still holds in my opinion, if one has a complex parallel problem mpi is a worthy contender, the thing is that in many occasions one doesn't need all its power. If a client server, a distributed or a map/reduce approach work, then simpler and more flexible solutions are superior. That (and its reliability problem, that PGAS also shares) is, in my opinion, the reason MPI is not very used outside the computational community. Being able to tackle also MPMD in a good way can be useful, and that is what the rpc level does between computers, and the event based scheduling within a single computer (ensuring that one processor can do meaningful work while the other waits. About MPI I think that many don't see what MPI really does, mpi offers a simplified parallel model. The main weakness of this model is that it assumes some kind of reliability, but then it offers a clear computational model with processors ordered in a linear of higher dimensional structure and efficient collective communication primitives. Yes MPI is not the right choice for all problems, but when usable it is very powerful, often superior to the alternatives, and programming with it is *simpler* than thinking about a generic distributed system. So I think that for problems that are not trivially parallel, or easily parallelizable MPI will remain as the best choice.
Re: Thoughts on parallel programming?
On 11/11/2010 03:24 AM, jfd wrote: Any thoughts on parallel programming. I was looking at something about Chapel and X10 languages etc. for parallelism, and it looks interesting. I know that it is still an area of active research, and it is not yet (far from?) done, but anyone have thoughts on this as future direction? Thank you. Unfortunately I only know about the standard stuff, OpenMP/OpenCL... Speaking of which: Are there any attempts to support lightweight multithreading in D, that is, something like OpenMP ? Thanks!
Re: Thoughts on parallel programming?
Unfortunately I only know about the standard stuff, OpenMP/OpenCL... Speaking of which: Are there any attempts to support lightweight multithreading in D, that is, something like OpenMP ? That would require compiler support for it. Other than that there only seems to be dsimcha's std.parallelism
Re: Thoughts on parallel programming?
On 11/11/2010 07:01 PM, Trass3r wrote: Unfortunately I only know about the standard stuff, OpenMP/OpenCL... Speaking of which: Are there any attempts to support lightweight multithreading in D, that is, something like OpenMP ? That would require compiler support for it. Other than that there only seems to be dsimcha's std.parallelism Ok, that's what I suspected. std.parallelism doesn't look to bad though, I'll try around with that...
Re: Thoughts on parallel programming?
On Thu, 2010-11-11 at 18:24 +0100, Tobias Pfaff wrote: [ . . . ] Unfortunately I only know about the standard stuff, OpenMP/OpenCL... Speaking of which: Are there any attempts to support lightweight multithreading in D, that is, something like OpenMP ? I'd hardly call OpenMP lightweight. I agree that as a meta-notation for directing the compiler how to insert appropriate code to force multithreading of certain classes of code, using OpenMP generally beats manual coding of the threads. But OpenMP is very Fortran oriented even though it can be useful for C, and indeed C++ as well. However, given things like Threading Building Blocks (TBB) and the functional programming inspired techniques used by Chapel, OpenMP increasingly looks like a hack rather than a solution. Using parallel versions of for, map, filter, reduce in the language is probably a better way forward. Having a D binding to OpenCL (and OpenGL, MPI, etc.) is probably going to be a good thing. -- Russel. = Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Roadm: +44 7770 465 077 xmpp: rus...@russel.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder signature.asc Description: This is a digitally signed message part
Re: Thoughts on parallel programming?
On Thu, 2010-11-11 at 15:16 +0100, Fawzi Mohamed wrote: [ . . . ] on this I am not so sure, heterogeneous clusters are more difficult to program, and GPU co are slowly becoming more and more general purpose. Being able to take advantage of those is useful, but I am not convinced they are necessarily the future. The Intel roadmap is for processor chips that have a number of cores with different architectures. Heterogeneity is not going going to be a choice, it is going to be an imposition. And this is at bus level, not at cluster level. [ . . . ] yes many core is the future I agree on this, and also that distributed approach is the only way to scale to a really large number of processors. Bud distributed systems *are* more complex, so I think that for the foreseeable future one will have a hybrid approach. Hybrid is what I am saying is the future whether we like it or not. SMP as the whole system is the past. I disagree that distributed systems are more complex per se. I suspect comments are getting so general here that anything anyone writes can be seen as both true and false simultaneously. My perception is that shared memory multithreading is less and less a tool that applications programmers should be thinking in terms of. Multiple processes with an hierarchy of communications costs is the overarching architecture with each process potentially being SMP or CSP or . . . again not sure the situation is as dire as you paint it, Linux does quite well in the HPC field... but I agree that to be the ideal OS for these architectures it will need more changes. The Linux driver architecture is already creaking at the seams, it implies a central monolithic approach to operating system. This falls down in a multiprocessor shared memory context. The fact that the Top 500 generally use Linux is because it is the least worst option. M$ despite throwing large amounts of money at the problem, and indeed bought some very high profile names to try and do something about the lack of traction, have failed to make any headway in the HPC operating system stakes. Do you want to have to run a virus checker on your HPC system? My gut reaction is that we are going to see a rise of hypervisors as per Tilera chips, at least in the short to medium term, simply as a bridge from the now OSes to the future. My guess is that L4 microkernels and/or nanokernels, exokernels, etc. will find a central place in future systems. The problem to be solved is ensuring that the appropriate ABI is available on the appropriate core at the appropriate time. Mobility of ABI is the critical factor here. [ . . . ] Whole array operation are useful, and when possible one gains much using them, unfortunately not all problems can be reduced to few large array operations, data parallel languages are not the main type of language for these reasons. Agreed. My point was that in 1960s code people explicitly handled array operations using do loops because they had to. Nowadays such code is anathema to efficient execution. My complaint here is that people have put effort into compiler technology instead of rewriting the codes in a better language and/or idiom. Clearly whole array operations only apply to algorithms that involve arrays! [ . . . ] well whole array operations are a generalization of the SPMD approach, so I this sense you said that that kind of approach will have a future (but with a more difficult optimization as the hardware is more complex. I guess this is where the PGAS people are challenging things. Applications can be couched in terms of array algorithms which can be scattered across distributed memory systems. Inappropriate operations lead to huge inefficiencies, but handles correctly, code runs very fast. About MPI I think that many don't see what MPI really does, mpi offers a simplified parallel model. The main weakness of this model is that it assumes some kind of reliability, but then it offers a clear computational model with processors ordered in a linear of higher dimensional structure and efficient collective communication primitives. Yes MPI is not the right choice for all problems, but when usable it is very powerful, often superior to the alternatives, and programming with it is *simpler* than thinking about a generic distributed system. So I think that for problems that are not trivially parallel, or easily parallelizable MPI will remain as the best choice. I guess my main irritant with MPI is that I have to run the same executable on every node and, perhaps more importantly, the message passing structure is founded on Fortran primitive data types. OK so you can hack up some element of abstraction so as to send complex messages, but it would be far better if the MPI standard provided better abstractions. [ . . . ] It might be a personal thing, but I am kind of suspicious toward PGAS, I find a generalized MPI model
Re: Thoughts on parallel programming?
Tobias Pfaff Wrote: On 11/11/2010 03:24 AM, jfd wrote: Any thoughts on parallel programming. I was looking at something about Chapel and X10 languages etc. for parallelism, and it looks interesting. I know that it is still an area of active research, and it is not yet (far from?) done, but anyone have thoughts on this as future direction? Thank you. Unfortunately I only know about the standard stuff, OpenMP/OpenCL... Speaking of which: Are there any attempts to support lightweight multithreading in D, that is, something like OpenMP ? I've considered backing spawn() calls by fibers multiplexed by a thread pool (receive() calls would cause the fiber to yield) instead of having each call generate a new kernel thread. The only issue is that TLS (ie. non-shared static storage) is thread-local, not fiber-local. One idea, however, is to do OSX-style manual TLS inside Fiber, so each fiber would have its own automatic local storage. Perhaps as an experiment I'll create a new derivative of Fiber that does this and see how it works.
Re: Thoughts on parallel programming?
Thu, 11 Nov 2010 19:41:56 +, Russel Winder wrote: On Thu, 2010-11-11 at 15:16 +0100, Fawzi Mohamed wrote: [ . . . ] on this I am not so sure, heterogeneous clusters are more difficult to program, and GPU co are slowly becoming more and more general purpose. Being able to take advantage of those is useful, but I am not convinced they are necessarily the future. The Intel roadmap is for processor chips that have a number of cores with different architectures. Heterogeneity is not going going to be a choice, it is going to be an imposition. And this is at bus level, not at cluster level. [ . . . ] yes many core is the future I agree on this, and also that distributed approach is the only way to scale to a really large number of processors. Bud distributed systems *are* more complex, so I think that for the foreseeable future one will have a hybrid approach. Hybrid is what I am saying is the future whether we like it or not. SMP as the whole system is the past. I disagree that distributed systems are more complex per se. I suspect comments are getting so general here that anything anyone writes can be seen as both true and false simultaneously. My perception is that shared memory multithreading is less and less a tool that applications programmers should be thinking in terms of. Multiple processes with an hierarchy of communications costs is the overarching architecture with each process potentially being SMP or CSP or . . . again not sure the situation is as dire as you paint it, Linux does quite well in the HPC field... but I agree that to be the ideal OS for these architectures it will need more changes. The Linux driver architecture is already creaking at the seams, it implies a central monolithic approach to operating system. This falls down in a multiprocessor shared memory context. The fact that the Top 500 generally use Linux is because it is the least worst option. M$ despite throwing large amounts of money at the problem, and indeed bought some very high profile names to try and do something about the lack of traction, have failed to make any headway in the HPC operating system stakes. Do you want to have to run a virus checker on your HPC system? My gut reaction is that we are going to see a rise of hypervisors as per Tilera chips, at least in the short to medium term, simply as a bridge from the now OSes to the future. My guess is that L4 microkernels and/or nanokernels, exokernels, etc. will find a central place in future systems. The problem to be solved is ensuring that the appropriate ABI is available on the appropriate core at the appropriate time. Mobility of ABI is the critical factor here. [ . . . ] Whole array operation are useful, and when possible one gains much using them, unfortunately not all problems can be reduced to few large array operations, data parallel languages are not the main type of language for these reasons. Agreed. My point was that in 1960s code people explicitly handled array operations using do loops because they had to. Nowadays such code is anathema to efficient execution. My complaint here is that people have put effort into compiler technology instead of rewriting the codes in a better language and/or idiom. Clearly whole array operations only apply to algorithms that involve arrays! [ . . . ] well whole array operations are a generalization of the SPMD approach, so I this sense you said that that kind of approach will have a future (but with a more difficult optimization as the hardware is more complex. I guess this is where the PGAS people are challenging things. Applications can be couched in terms of array algorithms which can be scattered across distributed memory systems. Inappropriate operations lead to huge inefficiencies, but handles correctly, code runs very fast. About MPI I think that many don't see what MPI really does, mpi offers a simplified parallel model. The main weakness of this model is that it assumes some kind of reliability, but then it offers a clear computational model with processors ordered in a linear of higher dimensional structure and efficient collective communication primitives. Yes MPI is not the right choice for all problems, but when usable it is very powerful, often superior to the alternatives, and programming with it is *simpler* than thinking about a generic distributed system. So I think that for problems that are not trivially parallel, or easily parallelizable MPI will remain as the best choice. I guess my main irritant with MPI is that I have to run the same executable on every node and, perhaps more importantly, the message passing structure is founded on Fortran primitive data types. OK so you can hack up some element of abstraction so as to send complex messages, but it would be far better if the MPI standard provided better abstractions. [ . . . ] It might be a personal thing, but I
Re: Thoughts on parallel programming?
Thu, 11 Nov 2010 20:01:09 +, retard wrote: in CPUs the problems with programmability are slowing things down and many laptops are still dual-core despite multiple cores are more energy efficient than higher GHz and my home PC has 8 virtual cores in a single CPU. At least it seems so to me. My last 1 and 2 core systems had a TDP of 65 and 105W. Now it's 130W, the next gen have 12 cores and 130W TDP. So I currently have 8 CPU cores and 480 GPU cores. Unfortunately many open source applications don't use the GPU (maybe OpenGL 1.0 but usually software rendering. The gpu accelerated desktops are still buggy and crash prone) and are single threaded. Even some heavier tasks like video encoding uses cores very inefficiently. Would MPI help?
Re: Thoughts on parallel programming?
On 11/11/2010 02:41 PM, Sean Kelly wrote: Tobias Pfaff Wrote: On 11/11/2010 03:24 AM, jfd wrote: Any thoughts on parallel programming. I was looking at something about Chapel and X10 languages etc. for parallelism, and it looks interesting. I know that it is still an area of active research, and it is not yet (far from?) done, but anyone have thoughts on this as future direction? Thank you. Unfortunately I only know about the standard stuff, OpenMP/OpenCL... Speaking of which: Are there any attempts to support lightweight multithreading in D, that is, something like OpenMP ? I've considered backing spawn() calls by fibers multiplexed by a thread pool (receive() calls would cause the fiber to yield) instead of having each call generate a new kernel thread. The only issue is that TLS (ie. non-shared static storage) is thread-local, not fiber-local. One idea, however, is to do OSX-style manual TLS inside Fiber, so each fiber would have its own automatic local storage. Perhaps as an experiment I'll create a new derivative of Fiber that does this and see how it works. I actually did something similar for a very simple web server I was experimenting with. It is similar to how Erlang works in that the Erlang processes are, at least to me, similar to fibers and they are run in one of several threads in the interpreter. The only problem I had was ensuring that my logging was thread-safe. If you could implement a TLS-like system for Fibers, I think that would help prevent that issue. Casey
Re: Thoughts on parallel programming?
Having a D binding to OpenCL is probably going to be a good thing. http://bitbucket.org/trass3r/cl4d/wiki/Home
Re: Thoughts on parallel programming?
Russel Winder wrote: Agreed. My point was that in 1960s code people explicitly handled array operations using do loops because they had to. Nowadays such code is anathema to efficient execution. My complaint here is that people have put effort into compiler technology instead of rewriting the codes in a better language and/or idiom. Clearly whole array operations only apply to algorithms that involve arrays! Yup. I am bemused by the efforts put into analyzing loops so that they can (by the compiler) be re-written into a higher level construct, and then the higher level construct is compiled. It just is backwards what the compiler should be doing. The high level construct is what the programmer should be writing. It shouldn't be something the compiler reconstructs from low level source code.
Re: Thoughts on parallel programming?
retard Wrote: Thu, 11 Nov 2010 19:41:56 +, Russel Winder wrote: On Thu, 2010-11-11 at 15:16 +0100, Fawzi Mohamed wrote: [ . . . ] on this I am not so sure, heterogeneous clusters are more difficult to program, and GPU co are slowly becoming more and more general purpose. Being able to take advantage of those is useful, but I am not convinced they are necessarily the future. The Intel roadmap is for processor chips that have a number of cores with different architectures. Heterogeneity is not going going to be a choice, it is going to be an imposition. And this is at bus level, not at cluster level. [ . . . ] yes many core is the future I agree on this, and also that distributed approach is the only way to scale to a really large number of processors. Bud distributed systems *are* more complex, so I think that for the foreseeable future one will have a hybrid approach. Hybrid is what I am saying is the future whether we like it or not. SMP as the whole system is the past. I disagree that distributed systems are more complex per se. I suspect comments are getting so general here that anything anyone writes can be seen as both true and false simultaneously. My perception is that shared memory multithreading is less and less a tool that applications programmers should be thinking in terms of. Multiple processes with an hierarchy of communications costs is the overarching architecture with each process potentially being SMP or CSP or . . . again not sure the situation is as dire as you paint it, Linux does quite well in the HPC field... but I agree that to be the ideal OS for these architectures it will need more changes. The Linux driver architecture is already creaking at the seams, it implies a central monolithic approach to operating system. This falls down in a multiprocessor shared memory context. The fact that the Top 500 generally use Linux is because it is the least worst option. M$ despite throwing large amounts of money at the problem, and indeed bought some very high profile names to try and do something about the lack of traction, have failed to make any headway in the HPC operating system stakes. Do you want to have to run a virus checker on your HPC system? My gut reaction is that we are going to see a rise of hypervisors as per Tilera chips, at least in the short to medium term, simply as a bridge from the now OSes to the future. My guess is that L4 microkernels and/or nanokernels, exokernels, etc. will find a central place in future systems. The problem to be solved is ensuring that the appropriate ABI is available on the appropriate core at the appropriate time. Mobility of ABI is the critical factor here. [ . . . ] Whole array operation are useful, and when possible one gains much using them, unfortunately not all problems can be reduced to few large array operations, data parallel languages are not the main type of language for these reasons. Agreed. My point was that in 1960s code people explicitly handled array operations using do loops because they had to. Nowadays such code is anathema to efficient execution. My complaint here is that people have put effort into compiler technology instead of rewriting the codes in a better language and/or idiom. Clearly whole array operations only apply to algorithms that involve arrays! [ . . . ] well whole array operations are a generalization of the SPMD approach, so I this sense you said that that kind of approach will have a future (but with a more difficult optimization as the hardware is more complex. I guess this is where the PGAS people are challenging things. Applications can be couched in terms of array algorithms which can be scattered across distributed memory systems. Inappropriate operations lead to huge inefficiencies, but handles correctly, code runs very fast. About MPI I think that many don't see what MPI really does, mpi offers a simplified parallel model. The main weakness of this model is that it assumes some kind of reliability, but then it offers a clear computational model with processors ordered in a linear of higher dimensional structure and efficient collective communication primitives. Yes MPI is not the right choice for all problems, but when usable it is very powerful, often superior to the alternatives, and programming with it is *simpler* than thinking about a generic distributed system. So I think that for problems that are not trivially parallel, or easily parallelizable MPI will remain as the best choice. I guess my main irritant with MPI is that I have to run the same executable on every node and, perhaps more importantly, the message passing structure is founded on Fortran primitive data types. OK so you can hack up some element of abstraction so as to send complex messages, but it would be
Re: Thoughts on parallel programming?
Russel Winder wrote: At the heart of all this is that programmers are taught that algorithm is a sequence of actions to achieve a goal. Programmers are trained to think sequentially and this affects their coding. This means that parallelism has to be expressed at a sufficiently high level that programmers can still reason about algorithms as sequential things. I think it's more than being trained to think sequentially. I think it is in the inherent nature of how we think.
Re: Thoughts on parallel programming?
Walter: Yup. I am bemused by the efforts put into analyzing loops so that they can (by the compiler) be re-written into a higher level construct, and then the higher level construct is compiled. It just is backwards what the compiler should be doing. The high level construct is what the programmer should be writing. It shouldn't be something the compiler reconstructs from low level source code. I agree a lot. The language has to offer means to express all the semantics and constraints, that the arrays are disjointed, that the operations done on them are pure or not pure, that the operations are not pure but determined only by a small window in the arrays, and so on and on. And then the compiler has to optimize the code according to the presence of SIMD registers, multi-cores, etc. This maybe is not enough for max performance applications, but in most situations it's plenty enough. (Incidentally, this is a lot what the Chapel language does (and D doesn't), and what I have explained in two past posts about Chapel, that were mostly ignored.) Bye, bearophile
Re: Thoughts on parallel programming?
Thu, 11 Nov 2010 16:32:03 -0500, bearophile wrote: Walter: Yup. I am bemused by the efforts put into analyzing loops so that they can (by the compiler) be re-written into a higher level construct, and then the higher level construct is compiled. It just is backwards what the compiler should be doing. The high level construct is what the programmer should be writing. It shouldn't be something the compiler reconstructs from low level source code. I agree a lot. The language has to offer means to express all the semantics and constraints, that the arrays are disjointed, that the operations done on them are pure or not pure, that the operations are not pure but determined only by a small window in the arrays, and so on and on. And then the compiler has to optimize the code according to the presence of SIMD registers, multi-cores, etc. This maybe is not enough for max performance applications, but in most situations it's plenty enough. (Incidentally, this is a lot what the Chapel language does (and D doesn't), and what I have explained in two past posts about Chapel, that were mostly ignored.) How does the Chapel work when I need to sort data (just basic quicksort on 12 cores, for instance) or e.g. compile many files in parallel or encode xvid? What is the content of the array with xvid files?
Re: Thoughts on parallel programming?
Walter Bright Wrote: Russel Winder wrote: At the heart of all this is that programmers are taught that algorithm is a sequence of actions to achieve a goal. Programmers are trained to think sequentially and this affects their coding. This means that parallelism has to be expressed at a sufficiently high level that programmers can still reason about algorithms as sequential things. I think it's more than being trained to think sequentially. I think it is in the inherent nature of how we think. Distributed programming is essentially a bunch of little sequential program that interact, which is basically how people cooperate in the real world. I think that is by far the most intuitive of any concurrent programming model, though it's still a significant conceptual shift from the traditional monolithic imperative program.
Re: Thoughts on parallel programming?
Sean Kelly Wrote: Walter Bright Wrote: Russel Winder wrote: At the heart of all this is that programmers are taught that algorithm is a sequence of actions to achieve a goal. Programmers are trained to think sequentially and this affects their coding. This means that parallelism has to be expressed at a sufficiently high level that programmers can still reason about algorithms as sequential things. I think it's more than being trained to think sequentially. I think it is in the inherent nature of how we think. Distributed programming is essentially a bunch of little sequential program that interact, which is basically how people cooperate in the real world. I think that is by far the most intuitive of any concurrent programming model, though it's still a significant conceptual shift from the traditional monolithic imperative program. Intel promised this AVX instruction set next year. Does it also work like distributed processes? I hear it doubles your FLOPS. These are exciting times parallel computing. Lots of new medias for distributed message passing programming. Lots of little fibers filling the multimedia pipelines with parallel data. Might even beat GPU soon if Larrabee comes.
Re: Thoughts on parallel programming?
On 11/11/2010 08:10 PM, Russel Winder wrote: On Thu, 2010-11-11 at 18:24 +0100, Tobias Pfaff wrote: [ . . . ] Unfortunately I only know about the standard stuff, OpenMP/OpenCL... Speaking of which: Are there any attempts to support lightweight multithreading in D, that is, something like OpenMP ? I'd hardly call OpenMP lightweight. I agree that as a meta-notation for directing the compiler how to insert appropriate code to force multithreading of certain classes of code, using OpenMP generally beats manual coding of the threads. But OpenMP is very Fortran oriented even though it can be useful for C, and indeed C++ as well. However, given things like Threading Building Blocks (TBB) and the functional programming inspired techniques used by Chapel, OpenMP increasingly looks like a hack rather than a solution. Using parallel versions of for, map, filter, reduce in the language is probably a better way forward. Having a D binding to OpenCL (and OpenGL, MPI, etc.) is probably going to be a good thing. Well, I am looking for an easy efficient way to perform parallel numerical calculations on our 4-8 core machines. With C++, that's OpenMP (or GPGPU stuff using CUDA/OpenCL) for us now. Maybe lightweight was the wrong word, what I meant is that OpenMP is easy to use, and efficient for the problems we are solving. There actually might be better tools for that, honestly we didn't look into that much options -- we are no HPC guys, 1000-cpu clusters are not a relevant scenario and we are happy that we even started parallelizing our code at all :) Anyways, I was thinking about the logical thing to use in D for this scenario. It's nothing super-fancy, in cases just a parallel_for we will, and sometimes a map/reduce operation... Cheers, Tobias
Re: Thoughts on parallel programming?
== Quote from Tobias Pfaff (nos...@spam.no)'s article On 11/11/2010 08:10 PM, Russel Winder wrote: On Thu, 2010-11-11 at 18:24 +0100, Tobias Pfaff wrote: [ . . . ] Unfortunately I only know about the standard stuff, OpenMP/OpenCL... Speaking of which: Are there any attempts to support lightweight multithreading in D, that is, something like OpenMP ? I'd hardly call OpenMP lightweight. I agree that as a meta-notation for directing the compiler how to insert appropriate code to force multithreading of certain classes of code, using OpenMP generally beats manual coding of the threads. But OpenMP is very Fortran oriented even though it can be useful for C, and indeed C++ as well. However, given things like Threading Building Blocks (TBB) and the functional programming inspired techniques used by Chapel, OpenMP increasingly looks like a hack rather than a solution. Using parallel versions of for, map, filter, reduce in the language is probably a better way forward. Having a D binding to OpenCL (and OpenGL, MPI, etc.) is probably going to be a good thing. Well, I am looking for an easy efficient way to perform parallel numerical calculations on our 4-8 core machines. With C++, that's OpenMP (or GPGPU stuff using CUDA/OpenCL) for us now. Maybe lightweight was the wrong word, what I meant is that OpenMP is easy to use, and efficient for the problems we are solving. There actually might be better tools for that, honestly we didn't look into that much options -- we are no HPC guys, 1000-cpu clusters are not a relevant scenario and we are happy that we even started parallelizing our code at all :) Anyways, I was thinking about the logical thing to use in D for this scenario. It's nothing super-fancy, in cases just a parallel_for we will, and sometimes a map/reduce operation... Cheers, Tobias I think you'll be very pleased with std.parallelism when/if it gets into Phobos. The design philosophy is exactly what you're looking for: Simple shared memory parallelism on multicore computers, assuming no fancy/unusual OS-, compiler- or hardware-level infrastructure. Basically, it's got parallel foreach, parallel map, parallel reduce and parallel tasks. All you need to fully utilize it is DMD and a multicore PC. As a reminder, the docs are at http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html and the code is at http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/std_parallelism.d . If this doesn't meet your needs in its current form, I'd like as much constructive criticism as possible, as long as it's within the scope of simple, everyday parallelism without fancy infrastructure.
Re: Thoughts on parallel programming?
Gary Whatmore Wrote: %u Wrote: Sean Kelly Wrote: Walter Bright Wrote: Russel Winder wrote: At the heart of all this is that programmers are taught that algorithm is a sequence of actions to achieve a goal. Programmers are trained to think sequentially and this affects their coding. This means that parallelism has to be expressed at a sufficiently high level that programmers can still reason about algorithms as sequential things. I think it's more than being trained to think sequentially. I think it is in the inherent nature of how we think. Distributed programming is essentially a bunch of little sequential program that interact, which is basically how people cooperate in the real world. I think that is by far the most intuitive of any concurrent programming model, though it's still a significant conceptual shift from the traditional monolithic imperative program. Intel promised this AVX instruction set next year. Does it also work like distributed processes? I hear it doubles your FLOPS. These are exciting times parallel computing. Lots of new medias for distributed message passing programming. Lots of little fibers filling the multimedia pipelines with parallel data. Might even beat GPU soon if Larrabee comes. AVX isn't parallel programming, it's vector processing. A dying breed of paradigms. Parallel programming deals with concurrency. OpenMP and MPI. Chapel (don't know it, but heard it here). Fortran. These are all good examples. AVX is just a cpu intrinsics stuff in std.intrinsics Currently the amount of information available is scarce. I have no idea how I use AVX or SSE in D. Auto-vectorization? Does it cover all use cases? So.. SSE autovectorization intrinsics = loops, hand written inline assembly parts, very small scale local worker threads / fibers = dsimcha's lib, medium scale local area network = the great flagship distributed message passing system, huge clusters with 1000+ computers? Why is message passing system so important? Assume I have dual-core laptop with AVX instructions next year. Use of 2 threads doubles my processor power. Use of AVX gives 8 times more power in good loops. I have no cluster so the flagship system provides zero benefit.
Re: Thoughts on parallel programming?
jfd: Any thoughts on parallel programming. I was looking at something about Chapel and X10 languages etc. for parallelism, and it looks interesting. I know that it is still an area of active research, and it is not yet (far from?) done, but anyone have thoughts on this as future direction? Thank you. In past I have shown here two large posts about Chapel, that's a language contains several good ideas worth stealing, but my posts were mostly ignored. Chapel is designed for heavy numerical computing on multi-cores or multi-CPUs, it has good ideas of CPU-localization of the work, while D isn't much serious about that kind of parallelism (yet). So far D has instead embraced message-passing, that's fit for other purposes. Bye, bearophile
Re: Thoughts on parallel programming?
== Quote from jfd (j...@nospam.com)'s article Any thoughts on parallel programming. I was looking at something about Chapel and X10 languages etc. for parallelism, and it looks interesting. I know that it is still an area of active research, and it is not yet (far from?) done, but anyone have thoughts on this as future direction? Thank you. Well, there's my std.parallelism library, which is in review for inclusion in Phobos. (http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html, http://www.dsource.org/projects/scrapple/browser/trunk/parallelFuture/std_parallelism.d) One unfortunate thing about it is that it doesn't use (and actually bypasses) D's thread isolation system and allows unchecked sharing. I couldn't think of any way to create a pedal-to-metal parallelism library that was simultaneously useful, safe and worked with the language as-is, and I wanted something that worked **now**, not next year or in D3 or whatever, so I decided to omit safety. Given that the library is in review, now would be the perfect time to offer any suggestions on how it can be improved.