On Fri, 22 Jan 2016 12:18:29 +0100, you wrote: >Hi, > >2016-01-20 15:27 GMT+01:00 John Cox <j...@kynesim.co.uk>: >> The by22 code gained me an overall factor of two in the abs level decode >> - the gains do depend a lot on the quantity of residual - you gain a lot >> more on I-frames than you do otherwise as they tend to have much longer >> residuals. The higher the bitrate the more useful this code is. But as >> you note it didn't use vast amounts of time relative to everything else >> anyway. >> >> The reworking / simplification of the loop(s) around the abs level >> decode and the scaling gave me the biggest single improvement. > >The thing is, it provided no gain on no Win64 system I had at hand. Or >very minor, once I switched off things. The amount of new/changed code >would make it worth discussing, were it not for actual gains on arm.
I think on ARM that things fitted with its register limit more often - either way it was useful. Much of the simplificatin work was structural so it was possible for me to extract simple functions to code in asm. >> After that the reworking of get_sig_ceoff_flag_idxs was a useful gain > >Yes, this is the most agreeable part of the non-applied parts. > >> Special caseing the single coeff path gave a similar gain > >This is a big slowdown on Win64 and UHD-bluray like sequences, but >that can be switched off in that case. I'm a bit surprised that it generated a big slowdown - some cache must be running just on the edge, but yes if you normally have hi-bitrate stuff then it isn't wanted. On my test streams the bitrates were normally quite low - quite unlike what I would expect from blu-ray sequences. Default it to off on x86 but on on ARM? >> After that the scale rework - now probably 75% faster than it was >> previously but it wasn't taking a huge amount of time. > >The work is done, I don't mind. > >> And after that all the other bits - my experience with optimising this >> sort of code (I did a lot of work on a TI H.264 implementation in the >> past) is that no single change is going to do everything, you just have >> to polish everything until it goes fast enough. > >Sure. There may be positive interactions, but my own figures showed >the sigmap/greater than flags were the only ones worth optimizing on >Win64. Very plausibly >> Sorry - I don't quite understand what you've said here. > >Doesn't matter anymore, I think I have just laid out the parts >actually mattering, and for haswell/Win64 (ie x86_64). I think you've cleared up my misunderstanding in the expanded comments above. >I'll reply more in depth to the new patchset, but not until you're on >holidays. Which should leave me more time for reviewing it, so all the >better. Good oh. JC _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel