> > I think these may not be the best constants as you still do a lot of > calculation with them so maybe it could be simplified but I don't know > how. If there's no visible difference in output with normal usage only in > corned cases that only happened during testing then maybe it's too early > to try to implement this and we could go back to the previous 128 bit > accumulator unless you found some real world usage where this would > matter. Otherwise I think this would only matter with a concurent 2D > engine where host data writes could continue in the other half of the 256 > bit register while the 2D engine does the operation on the already written > half. So if we get the same graphical result with only storing 128 bits > for now we could do that and rething this when we can run the 2D engine > concurrently. >
With well-behaved drivers there should be no difference. The only real behavioral difference I've seen is when host_data blits are ended early. I have tests covering this behavior now and it's documented so I have no problem going back to the 128-bit implementation for now. If we do run into real drivers that depend on this we can always easily resurrect it.
