Re: [FFmpeg-devel] [PATCH] AAC encoder: refactor to resynchronize MIPS port
On Fri, Sep 18, 2015 at 3:10 PM, James Almer wrote: > On 9/15/2015 4:24 AM, Claudio Freire wrote: >> This patch refactors the AAC coders to reuse code >> between the MIPS port and the regular, portable C code. >> There were two main functions that had to use >> hand-optimized versions of quantization code: >> - search_for_quantizers_twoloop >> - codebook_trellis_rate >> >> Those two were split into their own template header >> files so they can be inlined inside both the MIPS port >> and the generic code. In each context, they'll link >> to their specialized implementations, and thus be >> optimized by the compiler. >> >> This approach I believe is better than maintaining >> several copies of each function. As past experience has >> proven, having to keep those in sync was error prone. >> In this way, they will remain in sync by default. >> >> Also, an implementation of the reconstructed output >> argument for the optimized quantize_and_encode >> functions is included in the patch. While the current >> implementation of search_for_pred still isn't using >> it, future iterations of main prediction probably will. >> It should not imply any measurable performance hit while >> not being used. >> >> >> Patch attached. > > This broke make checkheaders, since the new headers are purposely > missing functions like abs_pow34_v() when compiled standalone. > http://fate.ffmpeg.org/log.cgi?time=20150918161151&log=compile&slot=x86_64-archlinux-gcc-checkheaders I wasn't aware of checkheaders. I'll add it to skipheaders. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] AAC encoder: refactor to resynchronize MIPS port
On 9/15/2015 4:24 AM, Claudio Freire wrote: > This patch refactors the AAC coders to reuse code > between the MIPS port and the regular, portable C code. > There were two main functions that had to use > hand-optimized versions of quantization code: > - search_for_quantizers_twoloop > - codebook_trellis_rate > > Those two were split into their own template header > files so they can be inlined inside both the MIPS port > and the generic code. In each context, they'll link > to their specialized implementations, and thus be > optimized by the compiler. > > This approach I believe is better than maintaining > several copies of each function. As past experience has > proven, having to keep those in sync was error prone. > In this way, they will remain in sync by default. > > Also, an implementation of the reconstructed output > argument for the optimized quantize_and_encode > functions is included in the patch. While the current > implementation of search_for_pred still isn't using > it, future iterations of main prediction probably will. > It should not imply any measurable performance hit while > not being used. > > > Patch attached. This broke make checkheaders, since the new headers are purposely missing functions like abs_pow34_v() when compiled standalone. http://fate.ffmpeg.org/log.cgi?time=20150918161151&log=compile&slot=x86_64-archlinux-gcc-checkheaders ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] AAC encoder: refactor to resynchronize MIPS port
On Wed, Sep 16, 2015 at 12:30 PM, Nedeljko Babic wrote: Patch attached. I thought it was worth a review. It does include lots of copypaste. FTR, I tested MIPS 74Kf and x86_64 with make fate-aac >>> >>> full fate passes on qemu mips here as well! >> >>If there's no objections then, I will be pushing it later today, >>before it stops applying cleanly. > > LGTM regarding mips part. I have no objection. > > This is very helpful. Thanks. > > Regarding problem with ffserver, I don't think that the cause is the same > with the one in: > http://fate.ffmpeg.org/log.cgi?time=20150914220602&log=compile&slot=mips-linux-gcc-4.3.2 > > The problem on the link is caused by environment on which tests are being > compiled and run. > > You can send me your configuration line and how did you tested it (I am > afraid that I didn't work with ffserver), so I can check what is going on. This is the configure line that works: ./configure --target-os=linux --arch=mips --enable-cross-compile --cross-prefix=mips-linux- --target-exec='qemu-mips -cpu 74Kf' --prefix=/opt/cross --samples=/home/claudiofreire/tmp/audiosamples/ffsamples --extra-ldflags=-static --ranlib=mips-linux-ranlib --disable-ffserver And this one blows with the ICE: ./configure --target-os=linux --arch=mips --enable-cross-compile --cross-prefix=mips-linux- --target-exec='qemu-mips -cpu 74Kf' --prefix=/opt/cross --samples=/home/claudiofreire/tmp/audiosamples/ffsamples --extra-ldflags=-static --ranlib=mips-linux-ranlib > > Also, are you building it on some board, or are you cross compiling it? I'm cross-compiling as you may notice from the above configure lines. $> /opt/cross/bin/mips-linux-gcc --version mips-linux-gcc (GCC) 4.5.3 Copyright (C) 2010 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] AAC encoder: refactor to resynchronize MIPS port
>>> >>> >>> Patch attached. >>> >>> I thought it was worth a review. >>> >>> It does include lots of copypaste. >>> >>> FTR, I tested MIPS 74Kf and x86_64 with make fate-aac >> >> full fate passes on qemu mips here as well! > >If there's no objections then, I will be pushing it later today, >before it stops applying cleanly. LGTM regarding mips part. I have no objection. This is very helpful. Thanks. Regarding problem with ffserver, I don't think that the cause is the same with the one in: http://fate.ffmpeg.org/log.cgi?time=20150914220602&log=compile&slot=mips-linux-gcc-4.3.2 The problem on the link is caused by environment on which tests are being compiled and run. You can send me your configuration line and how did you tested it (I am afraid that I didn't work with ffserver), so I can check what is going on. Also, are you building it on some board, or are you cross compiling it? Thanks again, Nedeljko ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] AAC encoder: refactor to resynchronize MIPS port
On Tue, Sep 15, 2015 at 8:11 AM, Michael Niedermayer wrote: > On Tue, Sep 15, 2015 at 04:24:02AM -0300, Claudio Freire wrote: >> This patch refactors the AAC coders to reuse code >> between the MIPS port and the regular, portable C code. >> There were two main functions that had to use >> hand-optimized versions of quantization code: >> - search_for_quantizers_twoloop >> - codebook_trellis_rate >> >> Those two were split into their own template header >> files so they can be inlined inside both the MIPS port >> and the generic code. In each context, they'll link >> to their specialized implementations, and thus be >> optimized by the compiler. >> >> This approach I believe is better than maintaining >> several copies of each function. As past experience has >> proven, having to keep those in sync was error prone. >> In this way, they will remain in sync by default. >> >> Also, an implementation of the reconstructed output >> argument for the optimized quantize_and_encode >> functions is included in the patch. While the current >> implementation of search_for_pred still isn't using >> it, future iterations of main prediction probably will. >> It should not imply any measurable performance hit while >> not being used. >> >> >> Patch attached. >> >> I thought it was worth a review. >> >> It does include lots of copypaste. >> >> FTR, I tested MIPS 74Kf and x86_64 with make fate-aac > > full fate passes on qemu mips here as well! If there's no objections then, I will be pushing it later today, before it stops applying cleanly. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] AAC encoder: refactor to resynchronize MIPS port
On Tue, Sep 15, 2015 at 04:24:02AM -0300, Claudio Freire wrote: > This patch refactors the AAC coders to reuse code > between the MIPS port and the regular, portable C code. > There were two main functions that had to use > hand-optimized versions of quantization code: > - search_for_quantizers_twoloop > - codebook_trellis_rate > > Those two were split into their own template header > files so they can be inlined inside both the MIPS port > and the generic code. In each context, they'll link > to their specialized implementations, and thus be > optimized by the compiler. > > This approach I believe is better than maintaining > several copies of each function. As past experience has > proven, having to keep those in sync was error prone. > In this way, they will remain in sync by default. > > Also, an implementation of the reconstructed output > argument for the optimized quantize_and_encode > functions is included in the patch. While the current > implementation of search_for_pred still isn't using > it, future iterations of main prediction probably will. > It should not imply any measurable performance hit while > not being used. > > > Patch attached. > > I thought it was worth a review. > > It does include lots of copypaste. > > FTR, I tested MIPS 74Kf and x86_64 with make fate-aac full fate passes on qemu mips here as well! [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Awnsering whenever a program halts or runs forever is On a turing machine, in general impossible (turings halting problem). On any real computer, always possible as a real computer has a finite number of states N, and will either halt in less than N cycles or never halt. signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] AAC encoder: refactor to resynchronize MIPS port
This patch refactors the AAC coders to reuse code between the MIPS port and the regular, portable C code. There were two main functions that had to use hand-optimized versions of quantization code: - search_for_quantizers_twoloop - codebook_trellis_rate Those two were split into their own template header files so they can be inlined inside both the MIPS port and the generic code. In each context, they'll link to their specialized implementations, and thus be optimized by the compiler. This approach I believe is better than maintaining several copies of each function. As past experience has proven, having to keep those in sync was error prone. In this way, they will remain in sync by default. Also, an implementation of the reconstructed output argument for the optimized quantize_and_encode functions is included in the patch. While the current implementation of search_for_pred still isn't using it, future iterations of main prediction probably will. It should not imply any measurable performance hit while not being used. Patch attached. I thought it was worth a review. It does include lots of copypaste. FTR, I tested MIPS 74Kf and x86_64 with make fate-aac PS: yes, Nedeljko beat me by a small margin in syncing up the implementations, but still I believe sharing code is better than maintaining copies. Especially having in mind the big rewrite that is coming from the work in #2686. So I submit this approach for consideration. PS2: ffserver is causing some ICEs on cross gcc 4.5.3, a slight variation on this fate failure: http://fate.ffmpeg.org/log.cgi?time=20150914220602&log=compile&slot=mips-linux-gcc-4.3.2 - let me know if I can provide some more info on this to get it fixed somehow. From 3a530153a0c4ef18f5a0f936fd356e9493b5a00e Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Tue, 15 Sep 2015 03:59:45 -0300 Subject: [PATCH] AAC encoder: refactor to resynchronize MIPS port This patch refactors the AAC coders to reuse code between the MIPS port and the regular, portable C code. There were two main functions that had to use hand-optimized versions of quantization code: - search_for_quantizers_twoloop - codebook_trellis_rate Those two were split into their own template header files so they can be inlined inside both the MIPS port and the generic code. In each context, they'll link to their specialized implementations, and thus be optimized by the compiler. This approach I believe is better than maintaining several copies of each function. As past experience has proven, having to keep those in sync was error prone. In this way, they will remain in sync by default. Also, an implementation of the dequantized output argument for the optimized quantize_and_encode functions is included in the patch. While the current implementation of search_for_pred still isn't using it, future iterations of main prediction probably will. It should not imply any measurable performance hit while not being used. --- libavcodec/aaccoder.c| 284 +--- libavcodec/aaccoder_trellis.h| 194 + libavcodec/aaccoder_twoloop.h| 203 ++ libavcodec/aacenc_quantization.h | 14 ++ libavcodec/mips/aaccoder_mips.c | 452 ++- tests/fate/aac.mak | 15 +- 6 files changed, 538 insertions(+), 624 deletions(-) create mode 100644 libavcodec/aaccoder_trellis.h create mode 100644 libavcodec/aaccoder_twoloop.h diff --git a/libavcodec/aaccoder.c b/libavcodec/aaccoder.c index 524987d..8d5ea77 100644 --- a/libavcodec/aaccoder.c +++ b/libavcodec/aaccoder.c @@ -48,6 +48,8 @@ #include "aacenc_tns.h" #include "aacenc_pred.h" +#include "libavcodec/aaccoder_twoloop.h" + /** Frequency in Hz for lower limit of noise substitution **/ #define NOISE_LOW_LIMIT 4000 @@ -59,6 +61,8 @@ * replace low energy non zero bands */ #define NOISE_LAMBDA_REPLACE 1.948f +#include "libavcodec/aaccoder_trellis.h" + /** * structure used in optimal codebook search */ @@ -181,137 +185,6 @@ static void encode_window_bands_info(AACEncContext *s, SingleChannelElement *sce } } -static void codebook_trellis_rate(AACEncContext *s, SingleChannelElement *sce, - int win, int group_len, const float lambda) -{ -BandCodingPath path[120][CB_TOT_ALL]; -int w, swb, cb, start, size; -int i, j; -const int max_sfb = sce->ics.max_sfb; -const int run_bits = sce->ics.num_windows == 1 ? 5 : 3; -const int run_esc = (1 << run_bits) - 1; -int idx, ppos, count; -int stackrun[120], stackcb[120], stack_len; -float next_minbits = INFINITY; -int next_mincb = 0; - -abs_pow34_v(s->scoefs, sce->coeffs, 1024); -start = win*128; -for (cb = 0; cb < CB_TOT_ALL; cb++) { -path[0][cb].cost = run_bits+4; -path[0][cb].prev_idx = -1; -path[0][cb].run = 0; -} -for (swb = 0; swb < max_sfb; swb++) { -size = sce->ics.swb_sizes[swb]; -