On Tue Feb 23 02:36:41 PST 2016, kennylevin...@gmail.com wrote:
> Ah, no - it is not a system-wide adjustment, but adjustment of the plan9 
> specific runtime.sighandler implementation and everything called by it 
> directly. Notes that don't exit the process are queued and should run outside 
> the actual note handler.
> 
> I think the "magic" code will be isolated, and might fend off accidental 
> future additions of floating point registers. The magic-ness also only 
> revolves around avoiding duffzero and duffcopy in some way. I also think that 
> removing conditionals in the compiler will be a positive thing.
> 
> I still do not know the feasibility of my plan, whether it is possible to do 
> cleanly, or possible at all. Maybe someone smarter than me with knowledge on 
> the matter could chime in and call me an idiot?
> 
> Avoiding duffcopy should be easy with a simple memmove implementation. If 
> done right, we can also remove the plan9 specific runtime.memmove and only 
> use the slow memmove in sighandler (The globlal runtime.memmove is 
> implemented using MOVUPS just like duffcopy. Duffcopy is used for blockcopies 
> by the compiler in some cases, although I must admit to not know all the 
> cases yet).
> 
> Avoiding duffzero without compiler assistance is a bit more tricky - global 
> variables, stack on assembly functions, something like that.

fwiw, on modern amd64 machines, using the xmm and ymm registers has a benefit 
only in a narrow range
of sizes (384-511 bytes) and a subset of (mis-)alignments that i've forgotten.  
at least for the exact test setup
i used on 3-4 different µarches.  intel claims rep; movs is the 
(architecturally) fastest way to go.

i am not sure any of this makes much difference, as it's hard to know what a 
real-world memory
access pattern looks like, and that seems to dominate all but gigantic moves, 
for which rep; movs
is actually no slower than even the trickiest use of ymm registers.

- erik

Reply via email to