On Wednesday, 16 September 2015 at 14:11:04 UTC, Ola Fosheim Grøstad wrote:
On Wednesday, 16 September 2015 at 08:38:25 UTC, deadalnix wrote:
The energy comparison is bullshit. As long as you haven't loaded the data, you don't know how wide they are. Meaning you need either to go pessimistic and load for the worst case scenario or do 2 round trip to memory.

That really depends on memory layout and algorithm. A likely implementation would be a co-processor that would take a unum stream and then pipe it through a network of cores (tile based co-processor). The internal busses between cores are very very fast and with 256+ cores you get tremendous throughput. But you need a good compiler/libraries and software support.


No you don't. Because the streamer still need to load the unum one by one. Maybe 2 by 2 with a fair amount of hardware speculation (which means you are already trading energy for performances, so the energy argument is weak). There is no way you can feed 256+ cores that way.

To gives you a similar example, x86 decoding is often the bottleneck on an x86 CPU. The number of ALUs in x86 over the past decade decreased rather than increased, because you simply can't decode fast enough to feed them. Yet, x86 CPUs have a 64 ways speculative decoding as a first stage.

The hardware is likely to be slower as you'll need way more wiring than for regular floats, and wire is not only cost, but also time.

You need more transistors per ALU, but slower does not matter if the algorithm needs bounded accuracy or if it converge more quickly with unums. The key challenge for him is to create a market, meaning getting the semantics into scientific software and getting initial workable implementations out to scientists.

If there is a market demand, then there will be products. But you need to create the market first. Hence he wrote an easy to read book on the topic and support people who want to implement it.

The problem is not transistor it is wire. Because the damn thing is variadic in every ways, pretty much every bit as input can end up anywhere in the functional unit. That is a LOT of wire.

Reply via email to