> [...] the main slowdown > is reading from the hardware framebuffer, which plan 9 does a lot.
Where in the draw path does Plan 9 read back from the framebuffer? Why so much? Also, there is apparently a fast way to read-back on radeon series h/w - map some system memory on the card via the GART and do the same dance that drm's r100_copy_blit does. If the formats are amenable, it might be nice to expose cards' fast download-from-screen features, by adding something like Ctrl->dfs() or the like... -- vs