Hi, people. Allow me to chat a tiny bit on two vectorisation-related matters, in the context of R. I'm curious about if the following ideas have ever been considered, and rejected already.
First is about using the so-called Duff's device for partially unrolling loops. I did not overly check in R sources, and am not familiar with them anyway, but the only usage I saw is within "src/gnuwin32/malloc.c". Maybe it could be put to good usage in "src/main/arithmetic.c" and elsewhere. Second is about what is called "chaining" on some vector computers, in which one vector operation uses, as an operand, the result of another vector operation, even before that result is sent for register or memory storage; R could use this technique for sparing memory, when it "knows" that the result is going to be discarded anyway. I used and abused Duff's device a good while ago, when I was working in computer graphics; it was routinely used to speed up image-wide operations. With a few properly devised C pre-processor macros, it was made easy to use (I thrown mine away a few years ago, recognizing I lost interest in low-level coding matters, the macros could easily be rethought anyway). Questions existed at the time about unrolled loops fitting or not within specialised fetch-next-instruction caches of some CPUs, but nowadays, memory caches are much bigger then they used to be, I have the prejudice it is just not a problem anymore. Maybe more of a concern might be the conditionals implementing vector recycling (already hidden in macros), as they may disrupt the speed of merely falling through linear code. One might probably do without jumps using clever masking operations, yet I wonder how far we would safely benchmark at configuration time to decide best code to generate, and how good C would be to write masked conditionals. I'm not familiar enough with modern CPUs to judge if this really needs to be addressed or not. I would not doubt that hardware chaining is worth all the efforts the engineers put so the hardware recognises and activates it on the fly. Vectorised chaining implemented in software as a way to spare memory, may be much of a challenge, as it requires sort of half-compilation. One one hand, it might alleviate memory problems which are often the subject of discussions on R-help; through thrashing, going over real memory and into paging may considerably slow down an R application. On the other hand, unless very carefully implemented, chaining overhead might slow down all non-thrashing applications, which is most of them. Nevertheless, being softer on memory requirements is already a concern in R, I vaguely remember having read that R "tries to prove" that a vector being modified will not needed anymore in its original form, and when the proof succeeds, the original vector gets modified without prior copying. Chaining, despite difficult to implement, might be a significant further step, and so, be worth a discussion. -- François Pinard http://pinard.progiciels-bpi.ca ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel