On Thu, Oct 15, 2015 at 06:38:10AM -0400, Ganesh Ajjanagadde wrote: > On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkes <h.lepp...@gmail.com> wrote: > > On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos <ceho...@ag.or.at> wrote: > >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes: > >> > >>> What? My numbers actually show that the new code may be faster - > >> > >> No, you are misunderstanding the numbers you posted. > >> (Or I misunderstand them but nobody said so yet.) > >> > >> Highest runs are most relevant, skips have to be > >> avoided (afaik). > >> > >> [...] > >> > >>> If you continue to post such stuff that has no basis, I might actually > >>> get tempted into finding out for which floating point values the new > >>> code is significantly faster, craft a relevant audio file, and post it > >>> showing a huge performance difference - my random numbers benchmark > >>> shows there must exist such values. > >> > >> Please do so! > >> > >>> > The more important question is if you can see the same > >>> > changes in the disassembly of af_astats.o as what > >>> > ubitux posted here for a short test function? > >>> > >>> I do. He uses clang/gcc, so do I. > >> > >> Sorry, my understanding fails here (I am not a native speaker): > >> You did look at the disassembly of af_astats.o and there is > >> inlined code instead of a function call? > >> > >>> The reason (irrelevant) is that both > >>> of us run Arch. > >>> > >>> What is "more relevant" is if _you_ can see the changes > >>> on some non Linux platform. > >> > >> If you could show that it is faster on any platform > >> I would already be happy! > >> > > > > A more important check would be that its not significantly slower on > > any other platform. Just because one compiler/glibc combination > > manages to produce an efficient inlined function doesn't necessarily > > mean that some other compiler or libc couldn't produce a full function > > call with all the overhead that comes with it, becoming significantly > > slower. > > As I point out, all a libc implementer needs to do to be on par with > the macro is to add the inline keyword. This was added in c99. If said > libc does not, then it is fundamentally broken from a performance > perspective. A beginning programmer can do that in a couple of > minutes. Fix upstream and complain to them if it does not inline.
I dont know how the latest compilers handle "inline" but a few years ago gcc was rather dumb about inlining, and i think its not easy for a compiler to be actually not "dumb" A compiler cannot inline everything that has the inline keyword, it would lead (for some source code) to an explosion on size and compile time. and a good compiler will want to inline some functions even if they do not have the inline keyword Also its not easy to know for a compiler what to inline and what not, there could be 10 functions a1(),a2(), a3(), ... each calling the previous 10 times ... the way gcc handled this (in the past and AFAIK at least) is to have various complicated thresholds that limit the amount of inlining. The big annoyance with this (years ago at least) was that if you forced a function to be inlined by "force" gcc would then stop inlining something else and you ended up either forcing every single function you needed inlined or would have had to tune the thresholds it would be interresting to check if replacing FFABS by fabs causes any big changes to inlining behavior (maybe that can be done by comparing the list of symbols in the object files as fully inlined functions s´wouldnt show up but maybe there are other ways) anyway iam not against using fabs() for float/double FFABS() i just think some assumtations in this thread are possibly too optimistic, but its quite possible these replacements are all fine and the changes in inlining if any have no performance impact also if a *abs is implemented by using a branch (as in if its positive jump over a negate instruction) then branch prediction can play a sigificant role in performance, that is random values would be alot slower than the same values ordered a good implementation should not use a branch though, abs for floats and doubles is just setting the sign bit basically, platforms should have a dedicated instruction for that or in some cases a integer and/or could maybe even be used [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel