Hi, as a motus operandi for this review, I have no time for a proper one, or at least not fitting with John's timeframe. I'll try to close as many pending discussions, and would prefer if someone else completed the review/validation/commit.
2016-01-22 19:33 GMT+01:00 John Cox <j...@kynesim.co.uk>: > Fair enough - though given that your slowdowns are almost certainly > cache-related the whole may be quite different from the sum of the > parts. True, they don't always translate to anything noticeable, but that's the best tool we have to objectively decide. >>cosmetics? > > I renamed the variable, because c_idx can have values 0..2 and c_idx_nz > is a boolean with 0..1 and in the rewrite of the inc var it is important > that we are using the _nz variant so having the var named appropriately > seemed sensible. I don't really mind mixing some form of cosmetics (=supposedly without code generation consequences) although other people prefer splitting for easier review and regression testing. This is not a blocking item for me, just that finding the most appropriate commit would be nice. >>I suppose branch prediction could help here, but less likely than for >>get_cabac_sig_coeff_flag_idxs. >> >>Also for this and some others: why inline over av_always_inline? > No particularly good reason for this one - though for any fn that might > be called from multiple places there is a strong argument for just > "inline" as it allows the compiler to make a judgment call based on how > big L1 cache is and how bad the call penalty. Anyway, those kinds of micro-optimizations I'm suggesting need more testing (sequences, platforms), so let's roll with this. >>AV_WN64 of 0x0101010101010101ULL, or even a memset, as it would be >>inlined by the compiler, given its size, and done the best possible >>way. > > levels is int *, not char * Ok, sorry, then 0x0000000100000001ULL. But you can ignore this, it'll probably make no difference outside of a micro-benchmark. >>Saturation, could be a separate patch, with which I would be ok. btw and iirc, a comment indicated assumptions on what is a "legit" (instead of conforming ) bitstream/coeffs, making a conscious decision. I know Ronald, ffvp9's author, specifically decided to handle equivalent cases in bitstreams (hint) from Argon Designs. I have no opinion, but others might. >>Related to but not strictly bypass ? > > Not bypass per se, more the general optimisation of abs_level_remaining > - it is pulled out because I had a slightly better arm asm version of > the fn. So it could go in that patch, but this allows other asm to > override it if they so desire. What I meant: would better be there than in another commit. >>Doing: >> if (get_cabac(c, state0 + ctx_map[n])) >> *p++ = n; >> while (--n != 0) { >> if (get_cabac(c, state0 + ctx_map[n])) >> *p++ = n; >> } >>is most likely faster, probably also on arm, if the branch prediction is good. > > Not convinced. That will increase code size (as get_cabac will inline) > which can be pure poison as you have found out with USE_N_END_1. That's 300B, not 1.5KB. And I *know* it can help, just not on all platforms and sequences. The same decision was made for ffh264's equivalent, iirc. But this would need more validated results, so let's ignore that discussion. >>boolean? (not sure) > I saw no booleans anywhere in the rest of this code so assumed they were > (at best) deprecated. But if there is an official ffmpeg boolean then > that is what it should be. Sorry for the imprecision, ffmpeg doesn't have a boolean type. I meant this to be possibly split as its own kind of modification, not fitting with anything else. Or not, this shouldn't be a blocking item. >>So we decided to have #define USE_N_END_1 ARCH_ARM. As said in another >>mail, there's a 10-20% loss for the codeblock. >> >>USE_BY22_DIV also strangely provides no benefit. > Slightly surprised by that, but DIV is going to produce smaller code so > should be preferred on general grounds as it puts off cache disasters. There maybe pessimization (spelling?) there, if the compiler doesn't manage to generate the idiv + picking edx for the shifted result. I haven't checked the disassembled code. BR, -- Christophe _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel