Re: Future of performant software: cores or memory?

2014-07-18 Thread Mark Phillips
Hi Gary,

I wrote my initial post in January, but I just wanted to say...  Thanks for 
taking the time to write your reply - I very much appreciated it.

I suspect I will be writing algorithms in C++ for a while to come, but at 
some point I hope to do comparisons with Clojure versions.

Regards,

Mark.

On Thursday, 2 January 2014 13:03:41 UTC+10:30, Gary Trakhman wrote:

 It depends on your workload and use-cases, and you could fill many 
 textbooks with a discussion of the tradeoffs, but I can think of some rules 
 of thumb that point in clojure's favor.

 The most essential metric for me to see has been Amdahl's Law: 
 http://en.wikipedia.org/wiki/File:AmdahlsLaw.svg

 To summarize it, The speedup of a program using multiple processors in 
 parallel computing is limited by the time needed for the sequential 
 fraction of the program.

 My observation is that 'the program' can be replaced by 'the system'. 
  Memory and cpu are sort of like space/time duality, but not 100% either 
 way, since you can see with caches and latency that memory usage has time 
 implications, and cpu performance relies on registers and things.

 For example, in a shared memory system, the parallelization of the cache 
 coherence protocol of the hardware can dominate performance aspects of your 
 software once your software's 'good enough', thus I'd argue the protocol 
 itself becomes part of the 'sequential fraction' of the system and program 
 and subject to Amdahl's law.

 When this sort of thing happens often enough, programmers write code that 
 takes away control from this aspect of the system and does things manually, 
 since the programmers have better knowledge of the system than the CPU or 
 compiler ever will.  Switching to a message-passing algorithm might be the 
 way to do it in this case, manually overriding data sent between cores, 
 accepting the overhead of data copies and such along the way.

 Consider core.async, I think it's a great example of a thread scheduler 
 and set of abstractions to overcome the limitations of OS-provided threads, 
 schedulers, the control offered by them, and the backlash against code 
 written to them.  It's useful because the scenarios it addresses are common 
 enough now that we need to give ourselves back the control that was taken 
 away by OS's in the first place.

 Core.async is implemented by thread-pools, which I can consider a 
 subsection and delineation of a common set of resources for a specific 
 usage, similarly it's common in game-dev and I imagine other fields to use 
 memory-pools to offer predictable or better performance and things like 
 instrumentation.

 But, I would argue that the greatest hindrance to writing the systems we 
 want is actually programmer productivity.  Taken to an extreme, why 
 wouldn't we write our own OS's and kernels on bare-metal, in ASM or C every 
 time instead of re-using anything?  It's because the tradeoffs that are 
 offered by reusing different subsystems are well-understood and convenient, 
 or anything else is impractical/impossible.

 By picking C++ over Clojure/Java for these reasons you're effectively 
 saying (which might be true) that the difficulty of optimization and the 
 need for manual control is more important than all of the benefits(minus 
 drawbacks) afforded by a higher-level system.

 So, it's possible to improve memory-locality in clojure and java, people 
 do it all the time.  The java memory model is not a giant leap away from 
 x86/64 and there are ways to avoid being subject to the garbage collector 
 if you want it.

 This blog is a great resource: http://mechanical-sympathy.blogspot.com/

 I think clojure's value-add in the big-picture of performance and 
 parallelism is effective high-level coding, at-least-as-good-as-java 
 low-level coding, and the ease of traversing those abstraction levels in a 
 unified way.

 By default, you're getting immutable and functional code and the 
 implications of clojure's relative ease/difficulty tradeoffs.


 On Wed, Jan 1, 2014 at 8:59 PM, Mark P pier...@gmail.com javascript: 
 wrote:

 I have watched a number of clojure talks where the concurrent programming 
 benefits of Clojure are emphasized.  People have suggested that the number 
 of cores in computers is growing with an exponential trend.  Software 
 developers will need programming techniques (eg immutable functional 
 programing) which allow full harnessing of this.  Software that doesn't 
 utilize the cores will not perform well and will be left behind.

 But I have heard counter arguments to this view of the future...

1. The number of cores isn't growing very fast.
   - There isn't much demand for more cores, so this growth will be 
   very slow for most computer devices.
   2. Memory performance is the key at this time.
   - Whereas raw CPU speed previously was growing dramatically for 
   many years, this has never been true for memory speed.  As a result, 
 memory 
   access 

Future of performant software: cores or memory?

2014-01-01 Thread Mark P
I have watched a number of clojure talks where the concurrent programming 
benefits of Clojure are emphasized.  People have suggested that the number 
of cores in computers is growing with an exponential trend.  Software 
developers will need programming techniques (eg immutable functional 
programing) which allow full harnessing of this.  Software that doesn't 
utilize the cores will not perform well and will be left behind.

But I have heard counter arguments to this view of the future...

   1. The number of cores isn't growing very fast.
  - There isn't much demand for more cores, so this growth will be very 
  slow for most computer devices.
  2. Memory performance is the key at this time.
  - Whereas raw CPU speed previously was growing dramatically for many 
  years, this has never been true for memory speed.  As a result, memory 
  access accounts for a high percentage of execution time in modern 
software.
  - The key to writing performant software at this time (and for many 
  years to come) is to concentrate primarily on good memory performance. 
   This means...
 - Keep memory footprint of programs small.
 - Use various techniques to minimize cache misses.
  - I have heard claims that dramatic program performance improvements 
  may be achieved by concentrating on these memory considerations.
  - It is claimed that for the next 5 or 10 years, there will be much 
  more performance yield to be had by concentrating on memory 
optimizations, 
  than to concentrate on growth in the number of CPU cores for performance.
  - Yes utilize the multiple cores where appropriate, but only in 
  simple ways.  More focus should be given to performant memory usage 
within 
  each thread, than worrying about deeply entwined multi-threaded 
programming.
  - Languages like C/C++ allow for good memory optimizations much 
  better than Java.  And in Clojure, it sounds like it is harder again.
   
There is a lot that I like about Clojure, but is it unsuitable for software 
where performance will be important into the future?

Or will the increase in multi-core capabilities soon mean that memory 
performance limitations pale into insignificance compared with the 
computational gains achieved through Clojure techniques for effectively 
utilizing many cores?

Or is it possible to improve memory performance (eg reduce cache misses) 
within Clojure?

Thanks,

Mark P.

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Future of performant software: cores or memory?

2014-01-01 Thread Gary Trakhman
It depends on your workload and use-cases, and you could fill many
textbooks with a discussion of the tradeoffs, but I can think of some rules
of thumb that point in clojure's favor.

The most essential metric for me to see has been Amdahl's Law:
http://en.wikipedia.org/wiki/File:AmdahlsLaw.svg

To summarize it, The speedup of a program using multiple processors in
parallel computing is limited by the time needed for the sequential
fraction of the program.

My observation is that 'the program' can be replaced by 'the system'.
 Memory and cpu are sort of like space/time duality, but not 100% either
way, since you can see with caches and latency that memory usage has time
implications, and cpu performance relies on registers and things.

For example, in a shared memory system, the parallelization of the cache
coherence protocol of the hardware can dominate performance aspects of your
software once your software's 'good enough', thus I'd argue the protocol
itself becomes part of the 'sequential fraction' of the system and program
and subject to Amdahl's law.

When this sort of thing happens often enough, programmers write code that
takes away control from this aspect of the system and does things manually,
since the programmers have better knowledge of the system than the CPU or
compiler ever will.  Switching to a message-passing algorithm might be the
way to do it in this case, manually overriding data sent between cores,
accepting the overhead of data copies and such along the way.

Consider core.async, I think it's a great example of a thread scheduler and
set of abstractions to overcome the limitations of OS-provided threads,
schedulers, the control offered by them, and the backlash against code
written to them.  It's useful because the scenarios it addresses are common
enough now that we need to give ourselves back the control that was taken
away by OS's in the first place.

Core.async is implemented by thread-pools, which I can consider a
subsection and delineation of a common set of resources for a specific
usage, similarly it's common in game-dev and I imagine other fields to use
memory-pools to offer predictable or better performance and things like
instrumentation.

But, I would argue that the greatest hindrance to writing the systems we
want is actually programmer productivity.  Taken to an extreme, why
wouldn't we write our own OS's and kernels on bare-metal, in ASM or C every
time instead of re-using anything?  It's because the tradeoffs that are
offered by reusing different subsystems are well-understood and convenient,
or anything else is impractical/impossible.

By picking C++ over Clojure/Java for these reasons you're effectively
saying (which might be true) that the difficulty of optimization and the
need for manual control is more important than all of the benefits(minus
drawbacks) afforded by a higher-level system.

So, it's possible to improve memory-locality in clojure and java, people do
it all the time.  The java memory model is not a giant leap away from
x86/64 and there are ways to avoid being subject to the garbage collector
if you want it.

This blog is a great resource: http://mechanical-sympathy.blogspot.com/

I think clojure's value-add in the big-picture of performance and
parallelism is effective high-level coding, at-least-as-good-as-java
low-level coding, and the ease of traversing those abstraction levels in a
unified way.

By default, you're getting immutable and functional code and the
implications of clojure's relative ease/difficulty tradeoffs.


On Wed, Jan 1, 2014 at 8:59 PM, Mark P pierh...@gmail.com wrote:

 I have watched a number of clojure talks where the concurrent programming
 benefits of Clojure are emphasized.  People have suggested that the number
 of cores in computers is growing with an exponential trend.  Software
 developers will need programming techniques (eg immutable functional
 programing) which allow full harnessing of this.  Software that doesn't
 utilize the cores will not perform well and will be left behind.

 But I have heard counter arguments to this view of the future...

1. The number of cores isn't growing very fast.
   - There isn't much demand for more cores, so this growth will be
   very slow for most computer devices.
   2. Memory performance is the key at this time.
   - Whereas raw CPU speed previously was growing dramatically for
   many years, this has never been true for memory speed.  As a result, 
 memory
   access accounts for a high percentage of execution time in modern 
 software.
   - The key to writing performant software at this time (and for many
   years to come) is to concentrate primarily on good memory performance.
This means...
  - Keep memory footprint of programs small.
  - Use various techniques to minimize cache misses.
   - I have heard claims that dramatic program performance
   improvements may be achieved by concentrating on these memory