Re: Marketing of D - article topic ideas?

Andrei Alexandrescu Wed, 09 Jun 2010 07:40:39 -0700

On 06/09/2010 01:28 AM, retard wrote:

Wed, 09 Jun 2010 01:13:43 -0400, Nick Sabalausky wrote:

"retard"<r...@tard.com.invalid>  wrote in message
news:hun6ok$13s...@digitalmars.com...

Tue, 08 Jun 2010 16:14:51 -0500, Andrei Alexandrescu wrote:

On 06/08/2010 04:05 PM, Walter Bright wrote:

Andrei Alexandrescu wrote:

On 06/08/2010 01:27 PM, "Jérôme M. Berger" wrote:

Please define "reasonable performance"...


Within 15% of hand-optimized code specialized for the types at hand.


I would have said O(n) or O(log n), as opposed to, say, O(n*n).

General rules for performance improvements:

1. nobody notices a 10% improvement

2. users will start noticing speedups when they exceed 2x

3. a 10x speedup is a game changer


max of n elements is O(n).


This probably means that D 2 won't be very efficient on multicore until
the authors learn some basic parallel programming skills. Now where did
you get your PhD - I'm collecting a list of toy universities people
should avoid.


You used to have meaningful things to say. Now you're just trolling.


Max of n unordered elements can be solved in O(log log n) time assuming
you have enough cores and constant time memory access. Happy now?

When calculating the complexity of an operation you don't consider coresin unlimited supply; it's a constant. Complexity being calculated interms of the number of inputs, max of n elements is O(n) because youneed to look at each element at least once.

Cores being a constant k, you can then consider that max can bedistributed such that each core looks at n/k elements. What's left is kintermediate results, which can be further processed in log(k) time (seebelow); since we consider k a constant, this approach could claim a netk-fold speedup.

If available cores are proportional to n (an unusual assumption), youfirst compare pairs of the n elements with n/2 cores, then n/4comparisons against pairs of the remaining elements etc. Each stephalves the number of candidates so the complexity is O(log n). I'd becurious to find out how that can be reduced to O(log log n).

Finally, for small values of n, you could consider the number of coressufficiently large, but, as was pointed out, using threads for max isactually impractical, so superscalar execution may be a better venue.Practically this means exploiting ILP by reducing data dependenciesbetween intermediate results. I suggest you take a look at a threadentitled "challenge: implement the max function" that I started on01/21/2007. That thread discusses ILP issues, and continues with thethread "challenge #2: implement the varargs_reduce metafunction". Allowme to quote from my own post on 01/23/2007:


==================

That's a good point, and goes right to the core of my solution, which(similarly to Bruno Medeiros') arranges operations in an optimal way forsuperscalar evaluation, e.g. max(a, b, c, d) is expanded not to:


max2(max2(max2(a, b), c), d)

but instead to:

max2(max2(a, b), max2(c, d))

The number of operations is the same but in the latter case there islogaritmically less dependency between partial results. When max2expands into a primitive comparison, the code is easy prey for asuperscalar machine to execute in parallel.

This won't count in a great deal of real code, but the fact that thiswill go in the standard library warrants the thought. Besides, the factthat the language makes it so easy, we can actually think of suchsubtlety, is very encouraging.

==================

The messages that follow further discuss how associative reduce shouldbe ordered to take advantage of superscalar execution.



Andrei

Re: Marketing of D - article topic ideas?

Reply via email to