[FFmpeg-cvslog] avcodec/mpegvideo_dec: Don't sync AVCodecContext fields manually
ffmpeg | branch: master | Andreas Rheinhardt | Mon Aug 15 17:58:23 2022 +0200| [afd9da24d9da6b4a194c03779e8863b8a66ed745] | committer: Andreas Rheinhardt avcodec/mpegvideo_dec: Don't sync AVCodecContext fields manually They are already synced generically in update_context_from_thread() in pthread_frame.c. Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=afd9da24d9da6b4a194c03779e8863b8a66ed745 --- libavcodec/mpegvideo_dec.c | 5 - 1 file changed, 5 deletions(-) diff --git a/libavcodec/mpegvideo_dec.c b/libavcodec/mpegvideo_dec.c index 08385764b4..406c3feacf 100644 --- a/libavcodec/mpegvideo_dec.c +++ b/libavcodec/mpegvideo_dec.c @@ -90,11 +90,6 @@ int ff_mpeg_update_thread_context(AVCodecContext *dst, return ret; } -s->avctx->coded_height = s1->avctx->coded_height; -s->avctx->coded_width = s1->avctx->coded_width; -s->avctx->width = s1->avctx->width; -s->avctx->height= s1->avctx->height; - s->quarter_sample = s1->quarter_sample; s->coded_picture_number = s1->coded_picture_number; ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] avcodec/mpegvideo_dec: Remove commented-out cruft
ffmpeg | branch: master | Andreas Rheinhardt | Mon Aug 15 12:04:31 2022 +0200| [22e157c1c6d540040dc356f5e218f021060ccf46] | committer: Andreas Rheinhardt avcodec/mpegvideo_dec: Remove commented-out cruft The fields in question were removed in 759001c534287a96dc96d1e274665feb7059145d. Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=22e157c1c6d540040dc356f5e218f021060ccf46 --- libavcodec/mpegvideo_dec.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/libavcodec/mpegvideo_dec.c b/libavcodec/mpegvideo_dec.c index 7566fe69f9..08385764b4 100644 --- a/libavcodec/mpegvideo_dec.c +++ b/libavcodec/mpegvideo_dec.c @@ -73,8 +73,6 @@ int ff_mpeg_update_thread_context(AVCodecContext *dst, s->bitstream_buffer_size = s->allocated_bitstream_buffer_size = 0; if (s1->context_initialized) { -// s->picture_range_start += MAX_PICTURE_COUNT; -// s->picture_range_end+= MAX_PICTURE_COUNT; ff_mpv_idct_init(s); if ((err = ff_mpv_common_init(s)) < 0) { memset(s, 0, sizeof(*s)); ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] doc: fix binary values of SI prefixes
ffmpeg | branch: master | Chema Gonzalez | Wed Aug 17 10:05:39 2022 -0700| [59225b459fcf6c8b20ef7585cd87e73cb6a4113d] | committer: Andreas Rheinhardt doc: fix binary values of SI prefixes Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=59225b459fcf6c8b20ef7585cd87e73cb6a4113d --- doc/utils.texi | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/utils.texi b/doc/utils.texi index 232a0608b3..627b55d154 100644 --- a/doc/utils.texi +++ b/doc/utils.texi @@ -1073,13 +1073,13 @@ indication of the corresponding powers of 10 and of 2. @item T 10^12 / 2^40 @item P -10^15 / 2^40 +10^15 / 2^50 @item E -10^18 / 2^50 +10^18 / 2^60 @item Z -10^21 / 2^60 +10^21 / 2^70 @item Y -10^24 / 2^70 +10^24 / 2^80 @end table @c man end EXPRESSION EVALUATION ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] avcodec/ffv1enc: Remove redundant wrapper
ffmpeg | branch: master | Andreas Rheinhardt | Sun Aug 14 15:28:56 2022 +0200| [3553b70d6d1282c1118e12f78b90c402e0d5f25c] | committer: Andreas Rheinhardt avcodec/ffv1enc: Remove redundant wrapper Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=3553b70d6d1282c1118e12f78b90c402e0d5f25c --- libavcodec/ffv1enc.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/libavcodec/ffv1enc.c b/libavcodec/ffv1enc.c index 6f8b8275b5..90593fbaf1 100644 --- a/libavcodec/ffv1enc.c +++ b/libavcodec/ffv1enc.c @@ -1240,12 +1240,6 @@ static int encode_frame(AVCodecContext *avctx, AVPacket *pkt, return 0; } -static av_cold int encode_close(AVCodecContext *avctx) -{ -ff_ffv1_close(avctx); -return 0; -} - #define OFFSET(x) offsetof(FFV1Context, x) #define VE AV_OPT_FLAG_VIDEO_PARAM | AV_OPT_FLAG_ENCODING_PARAM static const AVOption options[] = { @@ -1281,7 +1275,7 @@ const FFCodec ff_ffv1_encoder = { .priv_data_size = sizeof(FFV1Context), .init = encode_init, FF_CODEC_ENCODE_CB(encode_frame), -.close = encode_close, +.close = ff_ffv1_close, .p.capabilities = AV_CODEC_CAP_SLICE_THREADS | AV_CODEC_CAP_DELAY, .p.pix_fmts = (const enum AVPixelFormat[]) { AV_PIX_FMT_YUV420P, AV_PIX_FMT_YUVA420P, AV_PIX_FMT_YUVA422P, AV_PIX_FMT_YUV444P, ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] avcodec/ffv1enc: Don't create and keep unnecessary reference
ffmpeg | branch: master | Andreas Rheinhardt | Sun Aug 14 13:37:56 2022 +0200| [7e9a79044105a649c6091049909ad242e6b35d2e] | committer: Andreas Rheinhardt avcodec/ffv1enc: Don't create and keep unnecessary reference Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=7e9a79044105a649c6091049909ad242e6b35d2e --- libavcodec/ffv1.h| 1 + libavcodec/ffv1enc.c | 15 ++- 2 files changed, 7 insertions(+), 9 deletions(-) diff --git a/libavcodec/ffv1.h b/libavcodec/ffv1.h index ac80fa85ce..3532815501 100644 --- a/libavcodec/ffv1.h +++ b/libavcodec/ffv1.h @@ -91,6 +91,7 @@ typedef struct FFV1Context { struct FFV1Context *fsrc; AVFrame *cur; +const AVFrame *cur_enc_frame; int plane_count; int ac; ///< 1=range coder <-> 0=golomb rice int ac_byte_count; ///< number of bytes used for AC coding diff --git a/libavcodec/ffv1enc.c b/libavcodec/ffv1enc.c index ec06636db5..6f8b8275b5 100644 --- a/libavcodec/ffv1enc.c +++ b/libavcodec/ffv1enc.c @@ -916,12 +916,12 @@ static void encode_slice_header(FFV1Context *f, FFV1Context *fs) put_symbol(c, state, f->plane[j].quant_table_index, 0); av_assert0(f->plane[j].quant_table_index == f->context_model); } -if (!f->picture.f->interlaced_frame) +if (!f->cur_enc_frame->interlaced_frame) put_symbol(c, state, 3, 0); else -put_symbol(c, state, 1 + !f->picture.f->top_field_first, 0); -put_symbol(c, state, f->picture.f->sample_aspect_ratio.num, 0); -put_symbol(c, state, f->picture.f->sample_aspect_ratio.den, 0); +put_symbol(c, state, 1 + !f->cur_enc_frame->top_field_first, 0); +put_symbol(c, state, f->cur_enc_frame->sample_aspect_ratio.num, 0); +put_symbol(c, state, f->cur_enc_frame->sample_aspect_ratio.den, 0); if (f->version > 3) { put_rac(c, state, fs->slice_coding_mode == 1); if (fs->slice_coding_mode == 1) @@ -1024,7 +1024,7 @@ static int encode_slice(AVCodecContext *c, void *arg) int height = fs->slice_height; int x= fs->slice_x; int y= fs->slice_y; -const AVFrame *const p = f->picture.f; +const AVFrame *const p = f->cur_enc_frame; const int ps = av_pix_fmt_desc_get(c->pix_fmt)->comp[0].step; int ret; RangeCoder c_bak = fs->c; @@ -1098,7 +1098,6 @@ static int encode_frame(AVCodecContext *avctx, AVPacket *pkt, { FFV1Context *f = avctx->priv_data; RangeCoder *const c = >slice_context[0]->c; -AVFrame *const p= f->picture.f; uint8_t keystate= 128; uint8_t *buf_p; int i, ret; @@ -1165,9 +1164,7 @@ static int encode_frame(AVCodecContext *avctx, AVPacket *pkt, ff_init_range_encoder(c, pkt->data, pkt->size); ff_build_rac_states(c, 0.05 * (1LL << 32), 256 - 8); -av_frame_unref(p); -if ((ret = av_frame_ref(p, pict)) < 0) -return ret; +f->cur_enc_frame = pict; if (avctx->gop_size == 0 || f->picture_number % avctx->gop_size == 0) { put_rac(c, , 1); ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] avcodec/get_buffer: Don't get AVPixFmtDescriptor unnecessarily
ffmpeg | branch: master | Andreas Rheinhardt | Mon Aug 15 10:18:22 2022 +0200| [f76cef5c518a9874ec4e3b4b36c5b909c3452919] | committer: Andreas Rheinhardt avcodec/get_buffer: Don't get AVPixFmtDescriptor unnecessarily It is unused since 3575a495f6dcc395656343380e13c57d48b9f976 (and the error message is dangerous: av_get_pix_fmt_name(format) returns NULL iff av_pix_fmt_desc_get(format) returns NULL and using a NULL string for %s would be UB). Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=f76cef5c518a9874ec4e3b4b36c5b909c3452919 --- libavcodec/get_buffer.c | 8 1 file changed, 8 deletions(-) diff --git a/libavcodec/get_buffer.c b/libavcodec/get_buffer.c index 3e45a0479f..a04fd878de 100644 --- a/libavcodec/get_buffer.c +++ b/libavcodec/get_buffer.c @@ -246,7 +246,6 @@ fail: static int video_get_buffer(AVCodecContext *s, AVFrame *pic) { FramePool *pool = (FramePool*)s->internal->pool->data; -const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(pic->format); int i; if (pic->data[0] || pic->data[1] || pic->data[2] || pic->data[3]) { @@ -254,13 +253,6 @@ static int video_get_buffer(AVCodecContext *s, AVFrame *pic) return -1; } -if (!desc) { -av_log(s, AV_LOG_ERROR, -"Unable to get pixel format descriptor for format %s\n", -av_get_pix_fmt_name(pic->format)); -return AVERROR(EINVAL); -} - memset(pic->data, 0, sizeof(pic->data)); pic->extended_data = pic->data; ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] avcodec/mpegpicture: Reset fields explicitly instead of memsetting them
ffmpeg | branch: master | Andreas Rheinhardt | Sun Aug 14 00:06:16 2022 +0200| [e50684318390b5cffc68da131f7630f11814b808] | committer: Andreas Rheinhardt avcodec/mpegpicture: Reset fields explicitly instead of memsetting them Improves the grepability of the code. (Furthermore, I hope that no compiler will really call memset for 28 bytes.) Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=e50684318390b5cffc68da131f7630f11814b808 --- libavcodec/mpegpicture.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/libavcodec/mpegpicture.c b/libavcodec/mpegpicture.c index 711ce35f9d..977bc65191 100644 --- a/libavcodec/mpegpicture.c +++ b/libavcodec/mpegpicture.c @@ -311,8 +311,6 @@ fail: */ void ff_mpeg_unref_picture(AVCodecContext *avctx, Picture *pic) { -int off = offsetof(Picture, hwaccel_priv_buf) + sizeof(pic->hwaccel_priv_buf); - pic->tf.f = pic->f; /* WM Image / Screen codecs allocate internal buffers with different * dimensions / colorspaces; ignore user-defined callbacks for these. */ @@ -328,7 +326,12 @@ void ff_mpeg_unref_picture(AVCodecContext *avctx, Picture *pic) if (pic->needs_realloc) free_picture_tables(pic); -memset((uint8_t*)pic + off, 0, sizeof(*pic) - off); +pic->hwaccel_picture_private = NULL; +pic->field_picture = 0; +pic->b_frame_score = 0; +pic->needs_realloc = 0; +pic->reference = 0; +pic->shared= 0; } int ff_update_picture_tables(Picture *dst, const Picture *src) ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] avcodec/h263dec: Don't set frame parameters redundantly
ffmpeg | branch: master | Andreas Rheinhardt | Sat Aug 13 20:59:07 2022 +0200| [f0ea5094afa3b056cf9c2f71cacadb4fbb7a6a95] | committer: Andreas Rheinhardt avcodec/h263dec: Don't set frame parameters redundantly This frame will be reset later in ff_mpv_frame_start() anyway. Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=f0ea5094afa3b056cf9c2f71cacadb4fbb7a6a95 --- libavcodec/h263dec.c | 4 1 file changed, 4 deletions(-) diff --git a/libavcodec/h263dec.c b/libavcodec/h263dec.c index a65f16caea..a14d7811f5 100644 --- a/libavcodec/h263dec.c +++ b/libavcodec/h263dec.c @@ -583,10 +583,6 @@ retry: s->codec_id == AV_CODEC_ID_H263I) s->gob_index = H263_GOB_HEIGHT(s->height); -// for skipping the frame -s->current_picture.f->pict_type = s->pict_type; -s->current_picture.f->key_frame = s->pict_type == AV_PICTURE_TYPE_I; - /* skip B-frames if we don't have reference frames */ if (!s->last_picture_ptr && (s->pict_type == AV_PICTURE_TYPE_B || s->droppable)) ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] avcodec/h263dec: Remove redundant code to set cur_pic_ptr
ffmpeg | branch: master | Andreas Rheinhardt | Sat Aug 13 20:37:11 2022 +0200| [74d623914f02aa79447df43a742efd0929dded04] | committer: Andreas Rheinhardt avcodec/h263dec: Remove redundant code to set cur_pic_ptr It is done later in ff_mpv_frame_start() (and nobody uses current_picture_ptr between setting it in ff_mpv_frame_start()). (The reason the vsynth*-h263-obmc ref files change is because the call to ff_find_unused_picture() now happens after the older pictures have been unreferenced in ff_mpv_frame_start(), so that their slots in the picture array can be immediately reused; the obmc code is somehow buggy and changes its output depending on the earlier contents of the motion_val buffer.) Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=74d623914f02aa79447df43a742efd0929dded04 --- libavcodec/h263dec.c | 7 --- tests/ref/vsynth/vsynth1-h263-obmc | 4 ++-- tests/ref/vsynth/vsynth2-h263-obmc | 4 ++-- tests/ref/vsynth/vsynth_lena-h263-obmc | 4 ++-- 4 files changed, 6 insertions(+), 13 deletions(-) diff --git a/libavcodec/h263dec.c b/libavcodec/h263dec.c index 8db0eccd89..a65f16caea 100644 --- a/libavcodec/h263dec.c +++ b/libavcodec/h263dec.c @@ -543,13 +543,6 @@ retry: return ret; } -if (!s->current_picture_ptr || s->current_picture_ptr->f->data[0]) { -int i = ff_find_unused_picture(s->avctx, s->picture, 0); -if (i < 0) -return i; -s->current_picture_ptr = >picture[i]; -} - avctx->has_b_frames = !s->low_delay; if (CONFIG_MPEG4_DECODER && avctx->codec_id == AV_CODEC_ID_MPEG4) { diff --git a/tests/ref/vsynth/vsynth1-h263-obmc b/tests/ref/vsynth/vsynth1-h263-obmc index b7a267a8cb..aed283ed53 100644 --- a/tests/ref/vsynth/vsynth1-h263-obmc +++ b/tests/ref/vsynth/vsynth1-h263-obmc @@ -1,4 +1,4 @@ 7dec64380f375e5118b66f3b1e24 *tests/data/fate/vsynth1-h263-obmc.avi 657320 tests/data/fate/vsynth1-h263-obmc.avi -844f7ee27fa122e199fe20987b41a15c *tests/data/fate/vsynth1-h263-obmc.out.rawvideo -stddev:8.16 PSNR: 29.89 MAXDIFF: 113 bytes: 7603200/ 7603200 +2a69f6b37378aa34418dfd04ec98c1c8 *tests/data/fate/vsynth1-h263-obmc.out.rawvideo +stddev:8.38 PSNR: 29.66 MAXDIFF: 116 bytes: 7603200/ 7603200 diff --git a/tests/ref/vsynth/vsynth2-h263-obmc b/tests/ref/vsynth/vsynth2-h263-obmc index 2cef7f551b..c0dcc3239e 100644 --- a/tests/ref/vsynth/vsynth2-h263-obmc +++ b/tests/ref/vsynth/vsynth2-h263-obmc @@ -1,4 +1,4 @@ 2d8a58b295e03f94e6a41468b2d3909e *tests/data/fate/vsynth2-h263-obmc.avi 208522 tests/data/fate/vsynth2-h263-obmc.avi -4a939ef99fc759293f2e609bfcacd2a4 *tests/data/fate/vsynth2-h263-obmc.out.rawvideo -stddev:6.10 PSNR: 32.41 MAXDIFF: 90 bytes: 7603200/ 7603200 +3500b4227c1e6309ca5213414599266f *tests/data/fate/vsynth2-h263-obmc.out.rawvideo +stddev:6.19 PSNR: 32.29 MAXDIFF: 111 bytes: 7603200/ 7603200 diff --git a/tests/ref/vsynth/vsynth_lena-h263-obmc b/tests/ref/vsynth/vsynth_lena-h263-obmc index 5b963107f6..78d7cc7277 100644 --- a/tests/ref/vsynth/vsynth_lena-h263-obmc +++ b/tests/ref/vsynth/vsynth_lena-h263-obmc @@ -1,4 +1,4 @@ 3c6946f808412ac320be9e0c36051ea2 *tests/data/fate/vsynth_lena-h263-obmc.avi 154730 tests/data/fate/vsynth_lena-h263-obmc.avi -588d992d9d8096da8bdc5027268da914 *tests/data/fate/vsynth_lena-h263-obmc.out.rawvideo -stddev:5.39 PSNR: 33.49 MAXDIFF: 82 bytes: 7603200/ 7603200 +737af7fb166e2260ba049ae6bc30673d *tests/data/fate/vsynth_lena-h263-obmc.out.rawvideo +stddev:5.42 PSNR: 33.44 MAXDIFF: 77 bytes: 7603200/ 7603200 ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] checkasm/sw_scale: hscale does not requires cpuflag test.
ffmpeg | branch: master | Alan Kelly | Fri Jul 15 17:01:31 2022 +0200| [da0a37bab7434ef485146ce8575c7948db1fe3e2] | committer: Anton Khirnov checkasm/sw_scale: hscale does not requires cpuflag test. This is done in ff_shuffle_filter_coefficients. Signed-off-by: Anton Khirnov > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=da0a37bab7434ef485146ce8575c7948db1fe3e2 --- tests/checkasm/sw_scale.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 9c07dd0421..86d266fb3e 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -278,8 +278,6 @@ static void check_hscale(void) const uint8_t *src, const int16_t *filter, const int32_t *filterPos, int filterSize); -int cpu_flags = av_get_cpu_flags(); - ctx = sws_alloc_context(); if (sws_init_context(ctx, NULL, NULL) < 0) fail(); @@ -328,8 +326,7 @@ static void check_hscale(void) ctx->dstW = ctx->chrDstW = input_sizes[dstWi]; ff_sws_init_scale(ctx); memcpy(filterAvx2, filter, sizeof(uint16_t) * (SRC_PIXELS * MAX_FILTER_WIDTH + MAX_FILTER_WIDTH)); -if ((cpu_flags & AV_CPU_FLAG_AVX2) && !(cpu_flags & AV_CPU_FLAG_SLOW_GATHER)) -ff_shuffle_filter_coefficients(ctx, filterPosAvx, width, filterAvx2, ctx->dstW); +ff_shuffle_filter_coefficients(ctx, filterPosAvx, width, filterAvx2, ctx->dstW); if (check_func(ctx->hcScale, "hscale_%d_to_%d__fs_%d_dstW_%d", ctx->srcBpc, ctx->dstBpc + 1, width, ctx->dstW)) { memset(dst0, 0, SRC_PIXELS * sizeof(dst0[0])); ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] libswscale: Enable hscale_avx2 for all input sizes.
ffmpeg | branch: master | Alan Kelly | Fri Jul 15 16:59:43 2022 +0200| [a38293e4448c9389e604af9858984361a5677a20] | committer: Anton Khirnov libswscale: Enable hscale_avx2 for all input sizes. ff_shuffle_filter_coefficients shuffles the tail as required. Signed-off-by: Anton Khirnov > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=a38293e4448c9389e604af9858984361a5677a20 --- libswscale/utils.c| 19 --- libswscale/x86/swscale.c | 6 ++ tests/checkasm/sw_scale.c | 2 +- 3 files changed, 19 insertions(+), 8 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index 34503e57f4..baa1791ebe 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -268,8 +268,7 @@ int ff_shuffle_filter_coefficients(SwsContext *c, int *filterPos, #if ARCH_X86_64 int i, j, k; int cpu_flags = av_get_cpu_flags(); -// avx2 hscale filter processes 16 pixel blocks. -if (!filter || dstW % 16 != 0) +if (!filter) return 0; if (EXTERNAL_AVX2_FAST(cpu_flags) && !(cpu_flags & AV_CPU_FLAG_SLOW_GATHER)) { if ((c->srcBpc == 8) && (c->dstBpc <= 14)) { @@ -281,9 +280,11 @@ int ff_shuffle_filter_coefficients(SwsContext *c, int *filterPos, } // Do not swap filterPos for pixels which won't be processed by // the main loop. - for (i = 0; i + 8 <= dstW; i += 8) { + for (i = 0; i + 16 <= dstW; i += 16) { FFSWAP(int, filterPos[i + 2], filterPos[i + 4]); FFSWAP(int, filterPos[i + 3], filterPos[i + 5]); + FFSWAP(int, filterPos[i + 10], filterPos[i + 12]); + FFSWAP(int, filterPos[i + 11], filterPos[i + 13]); } if (filterSize > 4) { // 16 pixels are processed at a time. @@ -297,6 +298,18 @@ int ff_shuffle_filter_coefficients(SwsContext *c, int *filterPos, } } } + // 4 pixels are processed at a time in the tail. + for (; i < dstW; i += 4) { + // 4 filter coeffs are processed at a time. + int rem = dstW - i >= 4 ? 4 : dstW - i; + for (k = 0; k + 4 <= filterSize; k += 4) { + for (j = 0; j < rem; ++j) { + int from = (i + j) * filterSize + k; + int to = i * filterSize + j * 4 + k * 4; + memcpy([to], [from], 4 * sizeof(int16_t)); + } + } + } } av_free(filterCopy); } diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 89ef9f5d2b..ec1ca0e01c 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -625,10 +625,8 @@ switch(c->dstBpc){ \ if (EXTERNAL_AVX2_FAST(cpu_flags) && !(cpu_flags & AV_CPU_FLAG_SLOW_GATHER)) { if ((c->srcBpc == 8) && (c->dstBpc <= 14)) { -if (c->chrDstW % 16 == 0) -ASSIGN_AVX2_SCALE_FUNC(c->hcScale, c->hChrFilterSize); -if (c->dstW % 16 == 0) -ASSIGN_AVX2_SCALE_FUNC(c->hyScale, c->hLumFilterSize); +ASSIGN_AVX2_SCALE_FUNC(c->hcScale, c->hChrFilterSize); +ASSIGN_AVX2_SCALE_FUNC(c->hyScale, c->hLumFilterSize); } } diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index cbe4460a99..9c07dd0421 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c @@ -329,7 +329,7 @@ static void check_hscale(void) ff_sws_init_scale(ctx); memcpy(filterAvx2, filter, sizeof(uint16_t) * (SRC_PIXELS * MAX_FILTER_WIDTH + MAX_FILTER_WIDTH)); if ((cpu_flags & AV_CPU_FLAG_AVX2) && !(cpu_flags & AV_CPU_FLAG_SLOW_GATHER)) -ff_shuffle_filter_coefficients(ctx, filterPosAvx, width, filterAvx2, SRC_PIXELS); +ff_shuffle_filter_coefficients(ctx, filterPosAvx, width, filterAvx2, ctx->dstW); if (check_func(ctx->hcScale, "hscale_%d_to_%d__fs_%d_dstW_%d", ctx->srcBpc, ctx->dstBpc + 1, width, ctx->dstW)) { memset(dst0, 0, SRC_PIXELS * sizeof(dst0[0])); ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] sws: allow avx2 hscale to process inputs of any size.
ffmpeg | branch: master | Alan Kelly | Tue Apr 26 10:00:02 2022 +0200| [a6724285fd45111436dd5242eab2c489182aa5c2] | committer: Anton Khirnov sws: allow avx2 hscale to process inputs of any size. The main loop processes blocks of 16 pixels. The tail processes blocks of size 4. Signed-off-by: Anton Khirnov > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=a6724285fd45111436dd5242eab2c489182aa5c2 --- libswscale/x86/scale_avx2.asm | 44 ++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/libswscale/x86/scale_avx2.asm b/libswscale/x86/scale_avx2.asm index 20acdbd633..37095e596a 100644 --- a/libswscale/x86/scale_avx2.asm +++ b/libswscale/x86/scale_avx2.asm @@ -53,6 +53,9 @@ cglobal hscale8to15_%1, 7, 9, 16, pos0, dst, w, srcmem, filter, fltpos, fltsize, mova m14, [four] shr fltsized, 2 %endif +cmp wq, 0x10 +jl .tail_loop +sub wq, 0x10 .loop: movu m1, [fltposq] movu m2, [fltposq+32] @@ -101,7 +104,46 @@ cglobal hscale8to15_%1, 7, 9, 16, pos0, dst, w, srcmem, filter, fltpos, fltsize, add fltposq, 0x40 add countq, 0x10 cmp countq, wq -jl .loop +jle .loop + +add wq, 0x10 +cmp countq, wq +jge .end + +.tail_loop: +movu xm1, [fltposq] +%ifidn %1, X4 +pxor xm9, xm9 +pxor xm10, xm10 +xor innerq, innerq +.tail_innerloop: +%endif +vpcmpeqd xm13, xm13 +vpgatherdd xm3,[srcmemq + xm1], xm13 +vpunpcklbw xm5, xm3, xm0 +vpunpckhbw xm6, xm3, xm0 +vpmaddwd xm5, xm5, [filterq] +vpmaddwd xm6, xm6, [filterq + 0x10] +add filterq, 0x20 +%ifidn %1, X4 +paddd xm9, xm5 +paddd xm10, xm6 +paddd xm1, xm14 +add innerq, 1 +cmp innerq, fltsizeq +jl .tail_innerloop +vphaddd xm5, xm9, xm10 +%else +vphaddd xm5, xm5, xm6 +%endif +vpsrad xm5, 7 +vpackssdw xm5, xm5, xm5 +vmovq [dstq + countq * 2], xm5 +add fltposq, 0x10 +add countq, 0x4 +cmp countq, wq +jl .tail_loop +.end: REP_RET %endmacro ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext
ffmpeg | branch: master | Alan Kelly | Wed Aug 17 11:20:39 2022 +0200| [51a34e8525fea2bbc29b42831d7a17f34e8518d3] | committer: Andreas Rheinhardt sws: Replace call to yuv2yuvX_mmx by yuv2yuvX_mmxext Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=51a34e8525fea2bbc29b42831d7a17f34e8518d3 --- libswscale/x86/swscale.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index 32d441245d..89ef9f5d2b 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -205,20 +205,17 @@ static void yuv2yuvX_ ##opt(const int16_t *filter, int filterSize, \ int remainder = (dstW % step); \ int pixelsProcessed = dstW - remainder; \ if(((uintptr_t)dest) & 15){ \ -yuv2yuvX_mmx(filter, filterSize, src, dest, dstW, dither, offset); \ +yuv2yuvX_mmxext(filter, filterSize, src, dest, dstW, dither, offset); \ return; \ } \ if(pixelsProcessed > 0) \ ff_yuv2yuvX_ ##opt(filter, filterSize - 1, 0, dest - offset, pixelsProcessed + offset, dither, offset); \ if(remainder > 0){ \ - ff_yuv2yuvX_mmx(filter, filterSize - 1, pixelsProcessed, dest - offset, pixelsProcessed + remainder + offset, dither, offset); \ + ff_yuv2yuvX_mmxext(filter, filterSize - 1, pixelsProcessed, dest - offset, pixelsProcessed + remainder + offset, dither, offset); \ } \ return; \ } -#if HAVE_MMX_EXTERNAL -YUV2YUVX_FUNC_MMX(mmx, 16) -#endif #if HAVE_MMXEXT_EXTERNAL YUV2YUVX_FUNC_MMX(mmxext, 16) #endif ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavc/aarch64: hevc_add_res add 12bit variants
ffmpeg | branch: master | J. Dekker | Tue Aug 16 07:01:53 2022 +0200| [ce2f47318bdd1586f538059ed36fbf61e825023d] | committer: J. Dekker lavc/aarch64: hevc_add_res add 12bit variants hevc_add_res_4x4_12_c: 46.0 hevc_add_res_4x4_12_neon: 18.7 hevc_add_res_8x8_12_c: 194.7 hevc_add_res_8x8_12_neon: 25.2 hevc_add_res_16x16_12_c: 716.0 hevc_add_res_16x16_12_neon: 69.7 hevc_add_res_32x32_12_c: 3820.7 hevc_add_res_32x32_12_neon: 261.0 Signed-off-by: J. Dekker > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=ce2f47318bdd1586f538059ed36fbf61e825023d --- libavcodec/aarch64/hevcdsp_idct_neon.S| 158 +- libavcodec/aarch64/hevcdsp_init_aarch64.c | 14 +++ 2 files changed, 102 insertions(+), 70 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 484eea8437..124c50998a 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -5,7 +5,7 @@ * * Ported from arm/hevcdsp_idct_neon.S by * Copyright (c) 2020 Reimar Döffinger - * Copyright (c) 2020 Josh Dekker + * Copyright (c) 2020 J. Dekker * * This file is part of FFmpeg. * @@ -37,11 +37,11 @@ const trans, align=4 .short 31, 22, 13, 4 endconst -.macro clip10 in1, in2, c1, c2 -smax\in1, \in1, \c1 -smax\in2, \in2, \c1 -smin\in1, \in1, \c2 -smin\in2, \in2, \c2 +.macro clip2 in1, in2, min, max +smax\in1, \in1, \min +smax\in2, \in2, \min +smin\in1, \in1, \max +smin\in2, \in2, \max .endm function ff_hevc_add_residual_4x4_8_neon, export=1 @@ -64,25 +64,6 @@ function ff_hevc_add_residual_4x4_8_neon, export=1 ret endfunc -function ff_hevc_add_residual_4x4_10_neon, export=1 -mov x12, x0 -ld1 {v0.8h-v1.8h}, [x1] -ld1 {v2.d}[0], [x12], x2 -ld1 {v2.d}[1], [x12], x2 -ld1 {v3.d}[0], [x12], x2 -sqadd v0.8h, v0.8h, v2.8h -ld1 {v3.d}[1], [x12], x2 -moviv4.8h, #0 -sqadd v1.8h, v1.8h, v3.8h -mvniv5.8h, #0xFC, lsl #8 // movi #0x3FF -clip10 v0.8h, v1.8h, v4.8h, v5.8h -st1 {v0.d}[0], [x0], x2 -st1 {v0.d}[1], [x0], x2 -st1 {v1.d}[0], [x0], x2 -st1 {v1.d}[1], [x0], x2 -ret -endfunc - function ff_hevc_add_residual_8x8_8_neon, export=1 add x12, x0, x2 add x2, x2, x2 @@ -103,25 +84,6 @@ function ff_hevc_add_residual_8x8_8_neon, export=1 ret endfunc -function ff_hevc_add_residual_8x8_10_neon, export=1 -add x12, x0, x2 -add x2, x2, x2 -mov x3, #8 -moviv4.8h, #0 -mvniv5.8h, #0xFC, lsl #8 // movi #0x3FF -1: subsx3, x3, #2 -ld1 {v0.8h-v1.8h}, [x1], #32 -ld1 {v2.8h}, [x0] -sqadd v0.8h, v0.8h, v2.8h -ld1 {v3.8h}, [x12] -sqadd v1.8h, v1.8h, v3.8h -clip10 v0.8h, v1.8h, v4.8h, v5.8h -st1 {v0.8h}, [x0], x2 -st1 {v1.8h}, [x12], x2 -bne 1b -ret -endfunc - function ff_hevc_add_residual_16x16_8_neon, export=1 mov x3, #16 add x12, x0, x2 @@ -148,28 +110,6 @@ function ff_hevc_add_residual_16x16_8_neon, export=1 ret endfunc -function ff_hevc_add_residual_16x16_10_neon, export=1 -mov x3, #16 -moviv20.8h, #0 -mvniv21.8h, #0xFC, lsl #8 // movi #0x3FF -add x12, x0, x2 -add x2, x2, x2 -1: subsx3, x3, #2 -ld1 {v16.8h-v17.8h}, [x0] -ld1 {v0.8h-v3.8h}, [x1], #64 -sqadd v0.8h, v0.8h, v16.8h -ld1 {v18.8h-v19.8h}, [x12] -sqadd v1.8h, v1.8h, v17.8h -sqadd v2.8h, v2.8h, v18.8h -sqadd v3.8h, v3.8h, v19.8h -clip10 v0.8h, v1.8h, v20.8h, v21.8h -clip10 v2.8h, v3.8h, v20.8h, v21.8h -st1 {v0.8h-v1.8h}, [x0], x2 -st1 {v2.8h-v3.8h}, [x12], x2 -bne 1b -ret -endfunc - function ff_hevc_add_residual_32x32_8_neon, export=1 add x12, x0, x2 add x2, x2, x2 @@ -209,10 +149,88 @@ function ff_hevc_add_residual_32x32_8_neon, export=1 ret endfunc -function ff_hevc_add_residual_32x32_10_neon, export=1 +.macro add_res bitdepth +function ff_hevc_add_residual_4x4_\bitdepth\()_neon,
[FFmpeg-cvslog] aarch64: me_cmp: Remove a leftover unnecessary instruction
ffmpeg | branch: master | Martin Storsjö | Thu Aug 18 12:14:15 2022 +0300| [48be6616d0536c5b0ff3ee58caee4c024ca64116] | committer: Martin Storsjö aarch64: me_cmp: Remove a leftover unnecessary instruction This was missed in a2e45ad407c526cd5ce2f3a361fb98084228cd6e. Signed-off-by: Martin Storsjö > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=48be6616d0536c5b0ff3ee58caee4c024ca64116 --- libavcodec/aarch64/me_cmp_neon.S | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index b89c25438e..4198985c6c 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -328,7 +328,6 @@ function ff_pix_abs16_y2_neon, export=1 // initialize buffers moviv29.8h, #0 // clear the accumulator moviv28.8h, #0 // clear the accumulator -movid18, #0 add x5, x2, x3 // pix2 + stride cmp w4, #4 b.lt2f @@ -386,9 +385,8 @@ function ff_pix_abs16_y2_neon, export=1 3: add v29.8h, v29.8h, v28.8h // Add vectors together uaddlv s16, v29.8h // Add up vector values -add d18, d18, d16 -fmovw0, s18 +fmovw0, s16 ret endfunc ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavc/aarch64: Add neon implementation for pix_abs8
ffmpeg | branch: master | Hubert Mazur | Tue Aug 16 14:20:16 2022 +0200| [70efa4d01188b61efc0b82e7241a59a32c7e2e22] | committer: Martin Storsjö lavc/aarch64: Add neon implementation for pix_abs8 Provide optimized implementation of pix_abs8 function for arm64. Performance comparison tests are shown below. - pix_abs_1_0_c: 101.2 - pix_abs_1_0_neon: 22.5 - sad_1_c: 101.2 - sad_1_neon: 22.5 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Martin Storsjö > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=70efa4d01188b61efc0b82e7241a59a32c7e2e22 --- libavcodec/aarch64/me_cmp_init_aarch64.c | 4 +++ libavcodec/aarch64/me_cmp_neon.S | 47 2 files changed, 51 insertions(+) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c index 7c03ce8c50..fb7c3f5059 100644 --- a/libavcodec/aarch64/me_cmp_init_aarch64.c +++ b/libavcodec/aarch64/me_cmp_init_aarch64.c @@ -31,6 +31,8 @@ int ff_pix_abs16_x2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t * ptrdiff_t stride, int h); int ff_pix_abs16_y2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, ptrdiff_t stride, int h); +int ff_pix_abs8_neon(MpegEncContext *s, const uint8_t *blk1, const uint8_t *blk2, + ptrdiff_t stride, int h); int sse16_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, ptrdiff_t stride, int h); @@ -48,8 +50,10 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) c->pix_abs[0][1] = ff_pix_abs16_x2_neon; c->pix_abs[0][2] = ff_pix_abs16_y2_neon; c->pix_abs[0][3] = ff_pix_abs16_xy2_neon; +c->pix_abs[1][0] = ff_pix_abs8_neon; c->sad[0] = ff_pix_abs16_neon; +c->sad[1] = ff_pix_abs8_neon; c->sse[0] = sse16_neon; c->sse[1] = sse8_neon; c->sse[2] = sse4_neon; diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index c0647c49e9..b89c25438e 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -72,6 +72,53 @@ function ff_pix_abs16_neon, export=1 ret endfunc +function ff_pix_abs8_neon, export=1 +// x0 unused +// x1 uint8_t *pix1 +// x2 uint8_t *pix2 +// x3 ptrdiff_t stride +// w4 int h + +moviv30.8h, #0 +cmp w4, #4 +b.lt2f + +// make 4 iterations at once +1: +ld1 {v0.8b}, [x1], x3 // Load pix1 for first iteration +ld1 {v1.8b}, [x2], x3 // Load pix2 for first iteration +ld1 {v2.8b}, [x1], x3 // Load pix1 for second iteration +uabal v30.8h, v0.8b, v1.8b// Absolute difference, first iteration +ld1 {v3.8b}, [x2], x3 // Load pix2 for second iteration +ld1 {v4.8b}, [x1], x3 // Load pix1 for third iteration +uabal v30.8h, v2.8b, v3.8b// Absolute difference, second iteration +ld1 {v5.8b}, [x2], x3 // Load pix2 for third iteration +sub w4, w4, #4 // h -= 4 +ld1 {v6.8b}, [x1], x3 // Load pix1 for foruth iteration +ld1 {v7.8b}, [x2], x3 // Load pix2 for fourth iteration +uabal v30.8h, v4.8b, v5.8b// Absolute difference, third iteration +cmp w4, #4 +uabal v30.8h, v6.8b, v7.8b// Absolute difference, foruth iteration +b.ge1b + +cbz w4, 3f + +// iterate by one +2: +ld1 {v0.8b}, [x1], x3 // Load pix1 +ld1 {v1.8b}, [x2], x3 // Load pix2 + +subsw4, w4, #1 +uabal v30.8h, v0.8b, v1.8b +b.ne2b + +3: +uaddlv s20, v30.8h // Add up vector +fmovw0, s20 + +ret +endfunc + function ff_pix_abs16_xy2_neon, export=1 // x0 unused // x1 uint8_t *pix1 ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavc/aarch64: Add neon implementation for sse8
ffmpeg | branch: master | Hubert Mazur | Tue Aug 16 14:20:15 2022 +0200| [74312e80d74eebf095d0092a6bb2f1f207626174] | committer: Martin Storsjö lavc/aarch64: Add neon implementation for sse8 Provide optimized implementation of sse8 function for arm64. Performance comparison tests are shown below. - sse_1_c: 130.7 - sse_1_neon: 29.7 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur Signed-off-by: Martin Storsjö > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=74312e80d74eebf095d0092a6bb2f1f207626174 --- libavcodec/aarch64/me_cmp_init_aarch64.c | 3 ++ libavcodec/aarch64/me_cmp_neon.S | 64 2 files changed, 67 insertions(+) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c index 30737e2436..7c03ce8c50 100644 --- a/libavcodec/aarch64/me_cmp_init_aarch64.c +++ b/libavcodec/aarch64/me_cmp_init_aarch64.c @@ -34,6 +34,8 @@ int ff_pix_abs16_y2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t * int sse16_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, ptrdiff_t stride, int h); +int sse8_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, + ptrdiff_t stride, int h); int sse4_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, ptrdiff_t stride, int h); @@ -49,6 +51,7 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) c->sad[0] = ff_pix_abs16_neon; c->sse[0] = sse16_neon; +c->sse[1] = sse8_neon; c->sse[2] = sse4_neon; } } diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index 26490e189f..c0647c49e9 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -420,6 +420,70 @@ function sse16_neon, export=1 ret endfunc +function sse8_neon, export=1 +// x0 - unused +// x1 - pix1 +// x2 - pix2 +// x3 - stride +// w4 - h + +moviv21.4s, #0 +moviv20.4s, #0 +cmp w4, #4 +b.le2f + +// make 4 iterations at once +1: + +// res = abs(pix1[0] - pix2[0]) +// res * res + +ld1 {v0.8b}, [x1], x3 // Load pix1 for first iteration +ld1 {v1.8b}, [x2], x3 // Load pix2 for second iteration +ld1 {v2.8b}, [x1], x3 // Load pix1 for second iteration +ld1 {v3.8b}, [x2], x3 // Load pix2 for second iteration +uabdl v30.8h, v0.8b, v1.8b// Absolute difference, first iteration +ld1 {v4.8b}, [x1], x3 // Load pix1 for third iteration +ld1 {v5.8b}, [x2], x3 // Load pix2 for third iteration +uabdl v29.8h, v2.8b, v3.8b// Absolute difference, second iteration +umlal v21.4s, v30.4h, v30.4h // Multiply lower half, first iteration +ld1 {v6.8b}, [x1], x3 // Load pix1 for fourth iteration +ld1 {v7.8b}, [x2], x3 // Load pix2 for fourth iteration +uabdl v28.8h, v4.8b, v5.8b// Absolute difference, third iteration +umlal v21.4s, v29.4h, v29.4h // Multiply lower half, second iteration +umlal2 v20.4s, v30.8h, v30.8h // Multiply upper half, first iteration +uabdl v27.8h, v6.8b, v7.8b// Absolute difference, fourth iteration +umlal v21.4s, v28.4h, v28.4h // Multiply lower half, third iteration +umlal2 v20.4s, v29.8h, v29.8h // Multiply upper half, second iteration +sub w4, w4, #4 // h -= 4 +umlal2 v20.4s, v28.8h, v28.8h // Multiply upper half, third iteration +umlal v21.4s, v27.4h, v27.4h // Multiply lower half, fourth iteration +cmp w4, #4 +umlal2 v20.4s, v27.8h, v27.8h // Multiply upper half, fourth iteration +b.ge1b + +cbz w4, 3f + +// iterate by one +2: +ld1 {v0.8b}, [x1], x3 // Load pix1 +ld1 {v1.8b}, [x2], x3 // Load pix2 +subsw4, w4, #1 +uabdl v30.8h, v0.8b, v1.8b +umlal v21.4s, v30.4h, v30.4h +umlal2 v20.4s, v30.8h, v30.8h + +b.ne2b + +3: +add v21.4s, v21.4s, v20.4s // Add accumulator vectors together +uaddlv d17, v21.4s // Add up vector + +fmovw0, s17 +ret + +endfunc +
[FFmpeg-cvslog] lavc/aarch64: Add neon implementation for pix_abs16_y2
ffmpeg | branch: master | Hubert Mazur | Tue Aug 16 14:20:14 2022 +0200| [a2e45ad407c526cd5ce2f3a361fb98084228cd6e] | committer: Martin Storsjö lavc/aarch64: Add neon implementation for pix_abs16_y2 Provide optimized implementation of pix_abs16_y2 function for arm64. Performance comparison tests are shown below. pix_abs_0_2_c: 317.2 pix_abs_0_2_neon: 37.5 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur Signed-off-by: Martin Storsjö > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=a2e45ad407c526cd5ce2f3a361fb98084228cd6e --- libavcodec/aarch64/me_cmp_init_aarch64.c | 3 ++ libavcodec/aarch64/me_cmp_neon.S | 75 2 files changed, 78 insertions(+) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c index 57722b6a9a..30737e2436 100644 --- a/libavcodec/aarch64/me_cmp_init_aarch64.c +++ b/libavcodec/aarch64/me_cmp_init_aarch64.c @@ -29,6 +29,8 @@ int ff_pix_abs16_xy2_neon(MpegEncContext *s, const uint8_t *blk1, const uint8_t ptrdiff_t stride, int h); int ff_pix_abs16_x2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, ptrdiff_t stride, int h); +int ff_pix_abs16_y2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, + ptrdiff_t stride, int h); int sse16_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, ptrdiff_t stride, int h); @@ -42,6 +44,7 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) if (have_neon(cpu_flags)) { c->pix_abs[0][0] = ff_pix_abs16_neon; c->pix_abs[0][1] = ff_pix_abs16_x2_neon; +c->pix_abs[0][2] = ff_pix_abs16_y2_neon; c->pix_abs[0][3] = ff_pix_abs16_xy2_neon; c->sad[0] = ff_pix_abs16_neon; diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index f3201739b8..26490e189f 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -271,6 +271,81 @@ function ff_pix_abs16_x2_neon, export=1 ret endfunc +function ff_pix_abs16_y2_neon, export=1 +// x0 unused +// x1 uint8_t *pix1 +// x2 uint8_t *pix2 +// x3 ptrdiff_t stride +// w4 int h + +// initialize buffers +moviv29.8h, #0 // clear the accumulator +moviv28.8h, #0 // clear the accumulator +movid18, #0 +add x5, x2, x3 // pix2 + stride +cmp w4, #4 +b.lt2f + +// make 4 iterations at once +1: + +// abs(pix1[0], avg2(pix2[0], pix2[0 + stride])) +// avg2(a, b) = (((a) + (b) + 1) >> 1) +// abs(x) = (x < 0 ? (-x) : (x)) + +ld1 {v1.16b}, [x2], x3 // Load pix2 for first iteration +ld1 {v2.16b}, [x5], x3 // Load pix3 for first iteration +ld1 {v0.16b}, [x1], x3 // Load pix1 for first iteration +urhadd v30.16b, v1.16b, v2.16b // Rounding halving add, first iteration +ld1 {v4.16b}, [x2], x3 // Load pix2 for second iteration +ld1 {v5.16b}, [x5], x3 // Load pix3 for second iteartion +uabal v29.8h, v0.8b, v30.8b // Absolute difference of lower half, first iteration +uabal2 v28.8h, v0.16b, v30.16b // Absolute difference of upper half, first iteration +ld1 {v3.16b}, [x1], x3 // Load pix1 for second iteration +urhadd v27.16b, v4.16b, v5.16b // Rounding halving add, second iteration +ld1 {v7.16b}, [x2], x3 // Load pix2 for third iteration +ld1 {v20.16b}, [x5], x3 // Load pix3 for third iteration +uabal v29.8h, v3.8b, v27.8b // Absolute difference of lower half for second iteration +uabal2 v28.8h, v3.16b, v27.16b // Absolute difference of upper half for second iteration +ld1 {v6.16b}, [x1], x3 // Load pix1 for third iteration +urhadd v26.16b, v7.16b, v20.16b// Rounding halving add, third iteration +ld1 {v22.16b}, [x2], x3 // Load pix2 for fourth iteration +ld1 {v23.16b}, [x5], x3 // Load pix3 for fourth iteration +uabal v29.8h, v6.8b, v26.8b // Absolute difference of lower half for third iteration +uabal2 v28.8h, v6.16b, v26.16b // Absolute difference of upper half for third iteration +ld1 {v21.16b},
[FFmpeg-cvslog] lavc/aarch64: Add neon implementation for sse16
ffmpeg | branch: master | Hubert Mazur | Tue Aug 16 14:20:12 2022 +0200| [ad251fd26243d93093206a511cb547f46b967e4c] | committer: Martin Storsjö lavc/aarch64: Add neon implementation for sse16 Provide neon implementation for sse16 function. Performance comparison tests are shown below. - sse_0_c: 268.2 - sse_0_neon: 43.5 Benchmarks and tests run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur Signed-off-by: Martin Storsjö > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=ad251fd26243d93093206a511cb547f46b967e4c --- libavcodec/aarch64/me_cmp_init_aarch64.c | 4 ++ libavcodec/aarch64/me_cmp_neon.S | 74 2 files changed, 78 insertions(+) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c index dfb9583320..ab2a1909ba 100644 --- a/libavcodec/aarch64/me_cmp_init_aarch64.c +++ b/libavcodec/aarch64/me_cmp_init_aarch64.c @@ -30,6 +30,9 @@ int ff_pix_abs16_xy2_neon(MpegEncContext *s, const uint8_t *blk1, const uint8_t int ff_pix_abs16_x2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, ptrdiff_t stride, int h); +int sse16_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, + ptrdiff_t stride, int h); + av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) { int cpu_flags = av_get_cpu_flags(); @@ -40,5 +43,6 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) c->pix_abs[0][3] = ff_pix_abs16_xy2_neon; c->sad[0] = ff_pix_abs16_neon; +c->sse[0] = sse16_neon; } } diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index cda7ce0408..b98b2b7e03 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -270,3 +270,77 @@ function ff_pix_abs16_x2_neon, export=1 ret endfunc + +function sse16_neon, export=1 +// x0 - unused +// x1 - pix1 +// x2 - pix2 +// x3 - stride +// w4 - h + +cmp w4, #4 +moviv17.4s, #0 +b.lt2f + +// Make 4 iterations at once +1: + +// res = abs(pix1[0] - pix2[0]) +// res * res + +ld1 {v0.16b}, [x1], x3 // Load pix1 vector for first iteration +ld1 {v1.16b}, [x2], x3 // Load pix2 vector for first iteration +ld1 {v2.16b}, [x1], x3 // Load pix1 vector for second iteration +uabdv30.16b, v0.16b, v1.16b // Absolute difference, first iteration +ld1 {v3.16b}, [x2], x3 // Load pix2 vector for second iteration +umull v29.8h, v30.8b, v30.8b // Multiply lower half of vectors, first iteration +umull2 v28.8h, v30.16b, v30.16b// Multiply upper half of vectors, first iteration +uabdv27.16b, v2.16b, v3.16b // Absolute difference, second iteration +uadalp v17.4s, v29.8h // Pairwise add, first iteration +ld1 {v4.16b}, [x1], x3 // Load pix1 for third iteration +umull v26.8h, v27.8b, v27.8b // Mulitply lower half, second iteration +umull2 v25.8h, v27.16b, v27.16b// Multiply upper half, second iteration +ld1 {v5.16b}, [x2], x3 // Load pix2 for third iteration +uadalp v17.4s, v26.8h // Pairwise add and accumulate, second iteration +uabdv24.16b, v4.16b, v5.16b // Absolute difference, third iteration +ld1 {v6.16b}, [x1], x3 // Load pix1 for fourth iteration +uadalp v17.4s, v25.8h // Pairwise add and accumulate, second iteration +umull v23.8h, v24.8b, v24.8b // Multiply lower half, third iteration +umull2 v22.8h, v24.16b, v24.16b// Multiply upper half, third iteration +uadalp v17.4s, v23.8h // Pairwise add and accumulate, third iteration +ld1 {v7.16b}, [x2], x3 // Load pix2 for fouth iteration +uadalp v17.4s, v22.8h // Pairwise add and accumulate, third iteration +uabdv21.16b, v6.16b, v7.16b // Absolute difference, fourth iteration +uadalp v17.4s, v28.8h // Pairwise add and accumulate, first iteration +umull v20.8h, v21.8b, v21.8b // Multiply lower half, fourth iteration +sub w4, w4, #4 // h -= 4 +umull2 v19.8h, v21.16b, v21.16b// Multiply upper half, fourth iteration +uadalp v17.4s, v20.8h
[FFmpeg-cvslog] lavc/aarch64: Add neon implementation for sse4
ffmpeg | branch: master | Hubert Mazur | Tue Aug 16 14:20:13 2022 +0200| [d7abb7d143fd1fbacb0084a8936bc4029afe5111] | committer: Martin Storsjö lavc/aarch64: Add neon implementation for sse4 Provide neon implementation for sse4 function. Performance comparison tests are shown below. - sse_2_c: 80.7 - sse_2_neon: 31.0 Benchmarks and tests are run with checkasm tool on AWS Graviton 3. Signed-off-by: Hubert Mazur Signed-off-by: Martin Storsjö > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=d7abb7d143fd1fbacb0084a8936bc4029afe5111 --- libavcodec/aarch64/me_cmp_init_aarch64.c | 3 ++ libavcodec/aarch64/me_cmp_neon.S | 56 2 files changed, 59 insertions(+) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c index ab2a1909ba..57722b6a9a 100644 --- a/libavcodec/aarch64/me_cmp_init_aarch64.c +++ b/libavcodec/aarch64/me_cmp_init_aarch64.c @@ -32,6 +32,8 @@ int ff_pix_abs16_x2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t * int sse16_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, ptrdiff_t stride, int h); +int sse4_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, + ptrdiff_t stride, int h); av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) { @@ -44,5 +46,6 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) c->sad[0] = ff_pix_abs16_neon; c->sse[0] = sse16_neon; +c->sse[2] = sse4_neon; } } diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S index b98b2b7e03..f3201739b8 100644 --- a/libavcodec/aarch64/me_cmp_neon.S +++ b/libavcodec/aarch64/me_cmp_neon.S @@ -344,3 +344,59 @@ function sse16_neon, export=1 ret endfunc + +function sse4_neon, export=1 +// x0 - unused +// x1 - pix1 +// x2 - pix2 +// x3 - stride +// w4 - h + +moviv16.4s, #0 // clear the result accumulator +cmp w4, #4 +b.le2f + +// make 4 iterations at once +1: + +// res = abs(pix1[0] - pix2[0]) +// res * res + +ld1 {v0.s}[0], [x1], x3 // Load pix1, first iteration +ld1 {v1.s}[0], [x2], x3 // Load pix2, first iteration +ld1 {v2.s}[0], [x1], x3 // Load pix1, second iteration +ld1 {v3.s}[0], [x2], x3 // Load pix2, second iteration +uabdl v30.8h, v0.8b, v1.8b// Absolute difference, first iteration +ld1 {v4.s}[0], [x1], x3 // Load pix1, third iteration +ld1 {v5.s}[0], [x2], x3 // Load pix2, third iteration +uabdl v29.8h, v2.8b, v3.8b// Absolute difference, second iteration +umlal v16.4s, v30.4h, v30.4h // Multiply vectors, first iteration +ld1 {v6.s}[0], [x1], x3 // Load pix1, fourth iteration +ld1 {v7.s}[0], [x2], x3 // Load pix2, fourth iteration +uabdl v28.8h, v4.8b, v5.8b// Absolute difference, third iteration +umlal v16.4s, v29.4h, v29.4h // Multiply and accumulate, second iteration +sub w4, w4, #4 +uabdl v27.8h, v6.8b, v7.8b// Absolue difference, fourth iteration +umlal v16.4s, v28.4h, v28.4h // Multiply and accumulate, third iteration +cmp w4, #4 +umlal v16.4s, v27.4h, v27.4h // Multiply and accumulate, fourth iteration +b.ge1b + +cbz w4, 3f + +// iterate by one +2: +ld1 {v0.s}[0], [x1], x3 // Load pix1 +ld1 {v1.s}[0], [x2], x3 // Load pix2 +uabdl v30.8h, v0.8b, v1.8b +subsw4, w4, #1 +umlal v16.4s, v30.4h, v30.4h + +b.ne2b + +3: +uaddlv d17, v16.4s // Add vector +fmovw0, s17 + +ret +endfunc ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] aarch64: me_cmp: Fix the indentation of function declarations
ffmpeg | branch: master | Martin Storsjö | Thu Aug 18 12:00:20 2022 +0300| [60109d5b3d7bc88703fd4edfa282f25d0653016b] | committer: Martin Storsjö aarch64: me_cmp: Fix the indentation of function declarations Signed-off-by: Martin Storsjö > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=60109d5b3d7bc88703fd4edfa282f25d0653016b --- libavcodec/aarch64/me_cmp_init_aarch64.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c b/libavcodec/aarch64/me_cmp_init_aarch64.c index 79c739914f..dfb9583320 100644 --- a/libavcodec/aarch64/me_cmp_init_aarch64.c +++ b/libavcodec/aarch64/me_cmp_init_aarch64.c @@ -26,9 +26,9 @@ int ff_pix_abs16_neon(MpegEncContext *s, const uint8_t *blk1, const uint8_t *blk2, ptrdiff_t stride, int h); int ff_pix_abs16_xy2_neon(MpegEncContext *s, const uint8_t *blk1, const uint8_t *blk2, - ptrdiff_t stride, int h); + ptrdiff_t stride, int h); int ff_pix_abs16_x2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2, - ptrdiff_t stride, int h); + ptrdiff_t stride, int h); av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx) { ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".