date:20211024

[FFmpeg-devel] [PATCH] swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions

2021-10-24 Thread mindmark

From: Mark Reid 

yuv2gbrp_full_X_4_512_c: 12096.6
yuv2gbrp_full_X_4_512_sse2: 10782.6
yuv2gbrp_full_X_4_512_sse4: 5143.6
yuv2gbrp_full_X_4_512_avx2: 3000.1
yuv2gbrap_full_X_4_512_c: 15463.1
yuv2gbrap_full_X_4_512_sse2: 14296.6
yuv2gbrap_full_X_4_512_sse4: 6319.1
yuv2gbrap_full_X_4_512_avx2: 3554.1
yuv2gbrp9be_full_X_4_512_c: 14281.6
yuv2gbrp9be_full_X_4_512_sse2: 11206.1
yuv2gbrp9be_full_X_4_512_sse4: 5033.6
yuv2gbrp9be_full_X_4_512_avx2: 3012.6
yuv2gbrp9le_full_X_4_512_c: 12688.6
yuv2gbrp9le_full_X_4_512_sse2: 10914.1
yuv2gbrp9le_full_X_4_512_sse4: 5144.6
yuv2gbrp9le_full_X_4_512_avx2: 3014.6
yuv2gbrp10be_full_X_4_512_c: 14257.6
yuv2gbrp10be_full_X_4_512_sse2: 11089.6
yuv2gbrp10be_full_X_4_512_sse4: 5039.1
yuv2gbrp10be_full_X_4_512_avx2: 3001.1
yuv2gbrp10le_full_X_4_512_c: 12098.6
yuv2gbrp10le_full_X_4_512_sse2: 10884.1
yuv2gbrp10le_full_X_4_512_sse4: 5138.1
yuv2gbrp10le_full_X_4_512_avx2: 2999.6
yuv2gbrap10be_full_X_4_512_c: 18549.6
yuv2gbrap10be_full_X_4_512_sse2: 14538.6
yuv2gbrap10be_full_X_4_512_sse4: 6292.6
yuv2gbrap10be_full_X_4_512_avx2: 3583.6
yuv2gbrap10le_full_X_4_512_c: 16631.1
yuv2gbrap10le_full_X_4_512_sse2: 14190.6
yuv2gbrap10le_full_X_4_512_sse4: 6348.1
yuv2gbrap10le_full_X_4_512_avx2: 3554.6
yuv2gbrp12be_full_X_4_512_c: 13555.1
yuv2gbrp12be_full_X_4_512_sse2: 10952.1
yuv2gbrp12be_full_X_4_512_sse4: 5137.6
yuv2gbrp12be_full_X_4_512_avx2: 3009.6
yuv2gbrp12le_full_X_4_512_c: 12082.6
yuv2gbrp12le_full_X_4_512_sse2: 10891.1
yuv2gbrp12le_full_X_4_512_sse4: 5184.1
yuv2gbrp12le_full_X_4_512_avx2: 3011.1
yuv2gbrap12be_full_X_4_512_c: 18689.6
yuv2gbrap12be_full_X_4_512_sse2: 14522.6
yuv2gbrap12be_full_X_4_512_sse4: 6237.6
yuv2gbrap12be_full_X_4_512_avx2: 3585.6
yuv2gbrap12le_full_X_4_512_c: 16760.6
yuv2gbrap12le_full_X_4_512_sse2: 14202.1
yuv2gbrap12le_full_X_4_512_sse4: 6252.1
yuv2gbrap12le_full_X_4_512_avx2: 3591.1
yuv2gbrp14be_full_X_4_512_c: 13555.6
yuv2gbrp14be_full_X_4_512_sse2: 10949.1
yuv2gbrp14be_full_X_4_512_sse4: 5185.1
yuv2gbrp14be_full_X_4_512_avx2: 3012.1
yuv2gbrp14le_full_X_4_512_c: 12068.1
yuv2gbrp14le_full_X_4_512_sse2: 10883.6
yuv2gbrp14le_full_X_4_512_sse4: 5145.1
yuv2gbrp14le_full_X_4_512_avx2: 3007.1
yuv2gbrp16be_full_X_4_512_c: 12383.6
yuv2gbrp16be_full_X_4_512_sse2: 8230.6
yuv2gbrp16be_full_X_4_512_sse4: 4765.6
yuv2gbrp16be_full_X_4_512_avx2: 2742.6
yuv2gbrp16le_full_X_4_512_c: 10906.1
yuv2gbrp16le_full_X_4_512_sse2: 28732.1
yuv2gbrp16le_full_X_4_512_sse4: 4709.6
yuv2gbrp16le_full_X_4_512_avx2: 2753.1
yuv2gbrap16be_full_X_4_512_c: 15472.6
yuv2gbrap16be_full_X_4_512_sse2: 11021.6
yuv2gbrap16be_full_X_4_512_sse4: 5487.6
yuv2gbrap16be_full_X_4_512_avx2: 3143.6
yuv2gbrap16le_full_X_4_512_c: 13668.6
yuv2gbrap16le_full_X_4_512_sse2: 10562.1
yuv2gbrap16le_full_X_4_512_sse4: 5506.6
yuv2gbrap16le_full_X_4_512_avx2: 3149.6
yuv2gbrpf32be_full_X_4_512_c: 15471.1
yuv2gbrpf32be_full_X_4_512_sse2: 8524.6
yuv2gbrpf32be_full_X_4_512_sse4: 4559.1
yuv2gbrpf32be_full_X_4_512_avx2: 2388.1
yuv2gbrpf32le_full_X_4_512_c: 14247.6
yuv2gbrpf32le_full_X_4_512_sse2: 7600.6
yuv2gbrpf32le_full_X_4_512_sse4: 4385.6
yuv2gbrpf32le_full_X_4_512_avx2: 2258.6
yuv2gbrapf32be_full_X_4_512_c: 18412.1
yuv2gbrapf32be_full_X_4_512_sse2: 11353.6
yuv2gbrapf32be_full_X_4_512_sse4: 5807.1
yuv2gbrapf32be_full_X_4_512_avx2: 2928.1
yuv2gbrapf32le_full_X_4_512_c: 16485.1
yuv2gbrapf32le_full_X_4_512_sse2: 10202.1
yuv2gbrapf32le_full_X_4_512_sse4: 5571.6
yuv2gbrapf32le_full_X_4_512_avx2: 2847.6


---
 libswscale/x86/output.asm | 440 +-
 libswscale/x86/swscale.c  |  99 +
 tests/checkasm/Makefile   |   2 +-
 tests/checkasm/checkasm.c |   1 +
 tests/checkasm/checkasm.h |   1 +
 tests/checkasm/sw_gbrp.c  | 198 +
 tests/fate/checkasm.mak   |   1 +
 7 files changed, 740 insertions(+), 2 deletions(-)
 create mode 100644 tests/checkasm/sw_gbrp.c

diff --git a/libswscale/x86/output.asm b/libswscale/x86/output.asm
index 52cf9f2c2e..e80b6256b4 100644
--- a/libswscale/x86/output.asm
+++ b/libswscale/x86/output.asm
@@ -38,7 +38,49 @@ pw_32: times 8 dw 32
 pd_255:times 8 dd 255
 pw_512:times 8 dw 512
 pw_1024:   times 8 dw 1024
-
+pd_65535_invf: times 8 dd 0x37800080 ;1.0/65535.0
+pd_yuv2gbrp16_start:   times 8 dd -0x4000
+pd_yuv2gbrp_y_start:   times 8 dd  (1 << 9)
+pd_yuv2gbrp_uv_start:  times 8 dd  ((1 << 9) - (128 << 19))
+pd_yuv2gbrp_a_start:   times 8 dd  (1 << 18)
+pd_yuv2gbrp16_offset:  times 8 dd  0x1  ;(1 << 16)
+pd_yuv2gbrp16_round13: times 8 dd  0x02000  ;(1 << 13)
+pd_yuv2gbrp16_a_offset:times 8 dd  0x20002000
+pd_yuv2gbrp16_upper30: times 8 dd  0x3FFF ;(1<<30) - 1
+pd_yuv2gbrp16_upper27: times 8 dd  0x07FF ;(1<<27) - 1
+pd_yuv2gbrp16_upperC:  times 8 dd  0xC000
+pb_lo_pack_shuffle8:db  0,  4,  8, 12, \
+   -1, -1, -1, -1, \
+   -1, -1, -1, -1, \
+   -1, -1, -1, -1
+pb_hi_pack_shuffle8:db -1, -1, -1, -1,

[FFmpeg-devel] [PATCH 6/6][BROKEN] avfilter/vf_nlmeans: add x86 SIMD

2021-10-24 Thread Paul B Mahol

Signed-off-by: Paul B Mahol 
---
 libavfilter/vf_nlmeans.c   |  3 ++
 libavfilter/vf_nlmeans.h   |  1 +
 libavfilter/x86/Makefile   |  2 +
 libavfilter/x86/vf_nlmeans.asm | 89 ++
 4 files changed, 95 insertions(+)
 create mode 100644 libavfilter/x86/vf_nlmeans.asm

diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c
index 93a14bcf19..16171d830a 100644
--- a/libavfilter/vf_nlmeans.c
+++ b/libavfilter/vf_nlmeans.c
@@ -513,6 +513,9 @@ void ff_nlmeans_init(NLMeansDSPContext *dsp)
 
 if (ARCH_AARCH64)
 ff_nlmeans_init_aarch64(dsp);
+
+if (ARCH_X86)
+ff_nlmeans_init_x86(dsp);
 }
 
 static av_cold int init(AVFilterContext *ctx)
diff --git a/libavfilter/vf_nlmeans.h b/libavfilter/vf_nlmeans.h
index d0d0056163..ae9f450dbf 100644
--- a/libavfilter/vf_nlmeans.h
+++ b/libavfilter/vf_nlmeans.h
@@ -45,5 +45,6 @@ typedef struct NLMeansDSPContext {
 
 void ff_nlmeans_init(NLMeansDSPContext *dsp);
 void ff_nlmeans_init_aarch64(NLMeansDSPContext *dsp);
+void ff_nlmeans_init_x86(NLMeansDSPContext *dsp);
 
 #endif /* AVFILTER_NLMEANS_H */
diff --git a/libavfilter/x86/Makefile b/libavfilter/x86/Makefile
index a29941eaeb..e87481bd7a 100644
--- a/libavfilter/x86/Makefile
+++ b/libavfilter/x86/Makefile
@@ -20,6 +20,7 @@ OBJS-$(CONFIG_LIMITER_FILTER)+= 
x86/vf_limiter_init.o
 OBJS-$(CONFIG_LUT3D_FILTER)  += x86/vf_lut3d_init.o
 OBJS-$(CONFIG_MASKEDCLAMP_FILTER)+= x86/vf_maskedclamp_init.o
 OBJS-$(CONFIG_MASKEDMERGE_FILTER)+= x86/vf_maskedmerge_init.o
+OBJS-$(CONFIG_NLMEANS_FILTER)+= x86/vf_nlmeans_init.o
 OBJS-$(CONFIG_NOISE_FILTER)  += x86/vf_noise.o
 OBJS-$(CONFIG_OVERLAY_FILTER)+= x86/vf_overlay_init.o
 OBJS-$(CONFIG_PP7_FILTER)+= x86/vf_pp7_init.o
@@ -61,6 +62,7 @@ X86ASM-OBJS-$(CONFIG_LIMITER_FILTER) += 
x86/vf_limiter.o
 X86ASM-OBJS-$(CONFIG_LUT3D_FILTER)   += x86/vf_lut3d.o
 X86ASM-OBJS-$(CONFIG_MASKEDCLAMP_FILTER) += x86/vf_maskedclamp.o
 X86ASM-OBJS-$(CONFIG_MASKEDMERGE_FILTER) += x86/vf_maskedmerge.o
+X86ASM-OBJS-$(CONFIG_NLMEANS_FILTER) += x86/vf_nlmeans.o
 X86ASM-OBJS-$(CONFIG_OVERLAY_FILTER) += x86/vf_overlay.o
 X86ASM-OBJS-$(CONFIG_PP7_FILTER) += x86/vf_pp7.o
 X86ASM-OBJS-$(CONFIG_PSNR_FILTER)+= x86/vf_psnr.o
diff --git a/libavfilter/x86/vf_nlmeans.asm b/libavfilter/x86/vf_nlmeans.asm
new file mode 100644
index 00..aebcc59b54
--- /dev/null
+++ b/libavfilter/x86/vf_nlmeans.asm
@@ -0,0 +1,89 @@
+;*
+;* x86-optimized functions for nlmeans filter
+;*
+;* This file is part of FFmpeg.
+;*
+;* FFmpeg is free software; you can redistribute it and/or
+;* modify it under the terms of the GNU Lesser General Public
+;* License as published by the Free Software Foundation; either
+;* version 2.1 of the License, or (at your option) any later version.
+;*
+;* FFmpeg is distributed in the hope that it will be useful,
+;* but WITHOUT ANY WARRANTY; without even the implied warranty of
+;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;* Lesser General Public License for more details.
+;*
+;* You should have received a copy of the GNU Lesser General Public
+;* License along with FFmpeg; if not, write to the Free Software
+;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+;**
+
+
+%include "libavutil/x86/x86util.asm"
+
+%if HAVE_AVX2_EXTERNAL
+
+SECTION_RODATA
+
+SECTION .text
+
+; void ff_compute_weights_line(const uint32_t *const iia,
+;  const uint32_t *const iib,
+;  const uint32_t *const iid,
+;  const uint32_t *const iie,
+;  const uint8_t *const src,
+;  struct weighted_avg *wa,
+;  const float *const lut,
+;  int max,
+;  int startx, int endx);
+
+INIT_YMM avx2
+cglobal compute_weights_line, 11, 11, 7, iia, iib, iid, iie, src, wa, lut, 
max, startx, endx, x
+movsxdifnidn startxq, startxd
+movsxdifnidn   endxq, endxd
+movsxdifnidnmaxq, maxd
+
+sal  startxq, 2
+salendxq, 2
+
+mov   xq, startxq
+sar  startxq, 2
+VBROADCASTI128m4, maxm
+pcmpeqd   m5, m5
+
+.loop:
+movu  m0, [iieq + xq]
+movu  m1, [iidq + xq]
+movu  m2, [iibq + xq]
+movu  m3, [iiaq + xq]
+vpmovzxbd  m6, [srcq + startxq]
+vcvtdq2ps m6, m6
+
+psubd  m0, m1
+psubd  m0, m2
+paddd  m0, m3
+pminud m0, m4
+pslld  m0, 2
+mova   m3, m5
+vpgatherdd m1, [lutq + m0], m3
+
+vmulps

[FFmpeg-devel] [PATCH 5/6] avfilter/vf_nlmeans: refactor line processing in preparation for x86 SIMD assembly

2021-10-24 Thread Paul B Mahol

Signed-off-by: Paul B Mahol 
---
 libavfilter/vf_nlmeans.c | 109 ++-
 libavfilter/vf_nlmeans.h |  14 +
 2 files changed, 77 insertions(+), 46 deletions(-)

diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c
index af165c861c..93a14bcf19 100644
--- a/libavfilter/vf_nlmeans.c
+++ b/libavfilter/vf_nlmeans.c
@@ -38,11 +38,6 @@
 #include "vf_nlmeans.h"
 #include "video.h"
 
-struct weighted_avg {
-float total_weight;
-float sum;
-};
-
 typedef struct NLMeansContext {
 const AVClass *class;
 int nb_planes;
@@ -329,6 +324,58 @@ struct thread_data {
 int p;
 };
 
+static void compute_weights_line_c(const uint32_t *const iia,
+   const uint32_t *const iib,
+   const uint32_t *const iid,
+   const uint32_t *const iie,
+   const uint8_t *const src,
+   struct weighted_avg *wa,
+   const float *const weight_lut,
+   int max_meaningful_diff,
+   int startx, int endx)
+{
+for (int x = startx; x < endx; x++) {
+/*
+ * M is a discrete map where every entry contains the sum of all the 
entries
+ * in the rectangle from the top-left origin of M to its coordinate. 
In the
+ * following schema, "i" contains the sum of the whole map:
+ *
+ * M = +--+-++
+ * |  | ||
+ * |  | ||
+ * | a|b|   c|
+ * +--+-++
+ * |  | ||
+ * |  | ||
+ * |  |X||
+ * |  | ||
+ * | d|e|   f|
+ * +--+-++
+ * |  | ||
+ * | g|h|   i|
+ * +--+-++
+ *
+ * The sum of the X box can be calculated with:
+ *X = e-d-b+a
+ *
+ * See https://en.wikipedia.org/wiki/Summed_area_table
+ *
+ * The compute*_ssd functions compute the integral image M where every 
entry
+ * contains the sum of the squared difference of every corresponding 
pixels of
+ * two input planes of the same size as M.
+ */
+const uint32_t a = iia[x];
+const uint32_t b = iib[x];
+const uint32_t d = iid[x];
+const uint32_t e = iie[x];
+const uint32_t patch_diff_sq = FFMIN(e - d - b + a, 
max_meaningful_diff);
+const float weight = weight_lut[patch_diff_sq]; // exp(-patch_diff_sq 
* s->pdiff_scale)
+
+wa[x].total_weight += weight;
+wa[x].sum += weight * src[x];
+}
+}
+
 static int nlmeans_slice(AVFilterContext *ctx, void *arg, int jobnr, int 
nb_jobs)
 {
 NLMeansContext *s = ctx->priv;
@@ -346,50 +393,19 @@ static int nlmeans_slice(AVFilterContext *ctx, void *arg, 
int jobnr, int nb_jobs
 const int dist_d = dist_b * s->ii_lz_32;
 const int dist_e = dist_d + dist_b;
 const float *const weight_lut = s->weight_lut;
+NLMeansDSPContext *dsp = &s->dsp;
 
 for (int y = starty; y < endy; y++) {
-const uint8_t *src = td->src + y*src_linesize;
+const uint8_t *const src = td->src + y*src_linesize;
 struct weighted_avg *wa = s->wa + y*s->wa_linesize;
-for (int x = td->startx; x < td->endx; x++) {
-/*
- * M is a discrete map where every entry contains the sum of all 
the entries
- * in the rectangle from the top-left origin of M to its 
coordinate. In the
- * following schema, "i" contains the sum of the whole map:
- *
- * M = +--+-++
- * |  | ||
- * |  | ||
- * | a|b|   c|
- * +--+-++
- * |  | ||
- * |  | ||
- * |  |X||
- * |  | ||
- * | d|e|   f|
- * +--+-++
- * |  | ||
- * | g|h|   i|
- * +--+-++
- *
- * The sum of the X box can be calculated with:
- *X = e-d-b+a
- *
- * See https://en.wikipedia.o

[FFmpeg-devel] [PATCH 4/6] avfilter/vf_nlmeans: avoid if () to help paralellization

2021-10-24 Thread Paul B Mahol

Signed-off-by: Paul B Mahol 
---
 libavfilter/vf_nlmeans.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c
index d5a71291af..af165c861c 100644
--- a/libavfilter/vf_nlmeans.c
+++ b/libavfilter/vf_nlmeans.c
@@ -332,6 +332,7 @@ struct thread_data {
 static int nlmeans_slice(AVFilterContext *ctx, void *arg, int jobnr, int 
nb_jobs)
 {
 NLMeansContext *s = ctx->priv;
+const uint32_t max_meaningful_diff = s->max_meaningful_diff;
 const struct thread_data *td = arg;
 const ptrdiff_t src_linesize = td->src_linesize;
 const int process_h = td->endy - td->starty;
@@ -383,13 +384,11 @@ static int nlmeans_slice(AVFilterContext *ctx, void *arg, 
int jobnr, int nb_jobs
 const uint32_t b = ii[x + dist_b];
 const uint32_t d = ii[x + dist_d];
 const uint32_t e = ii[x + dist_e];
-const uint32_t patch_diff_sq = e - d - b + a;
+const uint32_t patch_diff_sq = FFMIN(e - d - b + a, 
max_meaningful_diff);
+const float weight = weight_lut[patch_diff_sq]; // 
exp(-patch_diff_sq * s->pdiff_scale)
 
-if (patch_diff_sq < s->max_meaningful_diff) {
-const float weight = weight_lut[patch_diff_sq]; // 
exp(-patch_diff_sq * s->pdiff_scale)
-wa[x].total_weight += weight;
-wa[x].sum += weight * src[x];
-}
+wa[x].total_weight += weight;
+wa[x].sum += weight * src[x];
 }
 ii += s->ii_lz_32;
 }
@@ -506,7 +505,7 @@ static av_cold int init(AVFilterContext *ctx)
 
 s->pdiff_scale = 1. / (h * h);
 s->max_meaningful_diff = log(255.) / s->pdiff_scale;
-s->weight_lut = av_calloc(s->max_meaningful_diff, sizeof(*s->weight_lut));
+s->weight_lut = av_calloc(s->max_meaningful_diff + 1, 
sizeof(*s->weight_lut));
 if (!s->weight_lut)
 return AVERROR(ENOMEM);
 for (int i = 0; i < s->max_meaningful_diff; i++)
-- 
2.33.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 3/6] avfilter/vf_nlmeans: no need to print filter options at info level

2021-10-24 Thread Paul B Mahol

Signed-off-by: Paul B Mahol 
---
 libavfilter/vf_nlmeans.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c
index 0962056a6e..d5a71291af 100644
--- a/libavfilter/vf_nlmeans.c
+++ b/libavfilter/vf_nlmeans.c
@@ -526,7 +526,7 @@ static av_cold int init(AVFilterContext *ctx)
 s->patch_hsize   = s->patch_size   / 2;
 s->patch_hsize_uv= s->patch_size_uv/ 2;
 
-av_log(ctx, AV_LOG_INFO, "Research window: %dx%d / %dx%d, patch size: 
%dx%d / %dx%d\n",
+av_log(ctx, AV_LOG_DEBUG, "Research window: %dx%d / %dx%d, patch size: 
%dx%d / %dx%d\n",
s->research_size, s->research_size, s->research_size_uv, 
s->research_size_uv,
s->patch_size,s->patch_size,s->patch_size_uv,
s->patch_size_uv);
 
-- 
2.33.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 2/6] avfilter/vf_nlmeans: make access to pointer to lut faster

2021-10-24 Thread Paul B Mahol

Signed-off-by: Paul B Mahol 
---
 libavfilter/vf_nlmeans.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c
index b8d8bb2ec0..0962056a6e 100644
--- a/libavfilter/vf_nlmeans.c
+++ b/libavfilter/vf_nlmeans.c
@@ -344,6 +344,7 @@ static int nlmeans_slice(AVFilterContext *ctx, void *arg, 
int jobnr, int nb_jobs
 const int dist_b = 2*p + 1;
 const int dist_d = dist_b * s->ii_lz_32;
 const int dist_e = dist_d + dist_b;
+const float *const weight_lut = s->weight_lut;
 
 for (int y = starty; y < endy; y++) {
 const uint8_t *src = td->src + y*src_linesize;
@@ -385,7 +386,7 @@ static int nlmeans_slice(AVFilterContext *ctx, void *arg, 
int jobnr, int nb_jobs
 const uint32_t patch_diff_sq = e - d - b + a;
 
 if (patch_diff_sq < s->max_meaningful_diff) {
-const float weight = s->weight_lut[patch_diff_sq]; // 
exp(-patch_diff_sq * s->pdiff_scale)
+const float weight = weight_lut[patch_diff_sq]; // 
exp(-patch_diff_sq * s->pdiff_scale)
 wa[x].total_weight += weight;
 wa[x].sum += weight * src[x];
 }
-- 
2.33.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/6] avfilter/vf_nlmeans: use more friendlier 'for (int ...'

2021-10-24 Thread Paul B Mahol

Signed-off-by: Paul B Mahol 
---
 libavfilter/vf_nlmeans.c | 33 -
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/libavfilter/vf_nlmeans.c b/libavfilter/vf_nlmeans.c
index 74fc3923b3..b8d8bb2ec0 100644
--- a/libavfilter/vf_nlmeans.c
+++ b/libavfilter/vf_nlmeans.c
@@ -101,14 +101,13 @@ static void compute_safe_ssd_integral_image_c(uint32_t 
*dst, ptrdiff_t dst_lines
   const uint8_t *s2, ptrdiff_t 
linesize2,
   int w, int h)
 {
-int x, y;
 const uint32_t *dst_top = dst - dst_linesize_32;
 
 /* SIMD-friendly assumptions allowed here */
 av_assert2(!(w & 0xf) && w >= 16 && h >= 1);
 
-for (y = 0; y < h; y++) {
-for (x = 0; x < w; x += 4) {
+for (int y = 0; y < h; y++) {
+for (int x = 0; x < w; x += 4) {
 const int d0 = s1[x] - s2[x];
 const int d1 = s1[x + 1] - s2[x + 1];
 const int d2 = s1[x + 2] - s2[x + 2];
@@ -161,14 +160,12 @@ static inline void 
compute_unsafe_ssd_integral_image(uint32_t *dst, ptrdiff_t ds
  int offx, int offy, int 
r, int sw, int sh,
  int w, int h)
 {
-int x, y;
-
-for (y = starty; y < starty + h; y++) {
+for (int y = starty; y < starty + h; y++) {
 uint32_t acc = dst[y*dst_linesize_32 + startx - 1] - 
dst[(y-1)*dst_linesize_32 + startx - 1];
 const int s1y = av_clip(y -  r, 0, sh - 1);
 const int s2y = av_clip(y - (r + offy), 0, sh - 1);
 
-for (x = startx; x < startx + w; x++) {
+for (int x = startx; x < startx + w; x++) {
 const int s1x = av_clip(x -  r, 0, sw - 1);
 const int s2x = av_clip(x - (r + offx), 0, sw - 1);
 const uint8_t v1 = src[s1y*linesize + s1x];
@@ -334,7 +331,6 @@ struct thread_data {
 
 static int nlmeans_slice(AVFilterContext *ctx, void *arg, int jobnr, int 
nb_jobs)
 {
-int x, y;
 NLMeansContext *s = ctx->priv;
 const struct thread_data *td = arg;
 const ptrdiff_t src_linesize = td->src_linesize;
@@ -349,10 +345,10 @@ static int nlmeans_slice(AVFilterContext *ctx, void *arg, 
int jobnr, int nb_jobs
 const int dist_d = dist_b * s->ii_lz_32;
 const int dist_e = dist_d + dist_b;
 
-for (y = starty; y < endy; y++) {
+for (int y = starty; y < endy; y++) {
 const uint8_t *src = td->src + y*src_linesize;
 struct weighted_avg *wa = s->wa + y*s->wa_linesize;
-for (x = td->startx; x < td->endx; x++) {
+for (int x = td->startx; x < td->endx; x++) {
 /*
  * M is a discrete map where every entry contains the sum of all 
the entries
  * in the rectangle from the top-left origin of M to its 
coordinate. In the
@@ -404,10 +400,8 @@ static void weight_averages(uint8_t *dst, ptrdiff_t 
dst_linesize,
 struct weighted_avg *wa, ptrdiff_t wa_linesize,
 int w, int h)
 {
-int x, y;
-
-for (y = 0; y < h; y++) {
-for (x = 0; x < w; x++) {
+for (int y = 0; y < h; y++) {
+for (int x = 0; x < w; x++) {
 // Also weight the centered pixel
 wa[x].total_weight += 1.f;
 wa[x].sum += 1.f * src[x];
@@ -423,7 +417,6 @@ static int nlmeans_plane(AVFilterContext *ctx, int w, int 
h, int p, int r,
  uint8_t *dst, ptrdiff_t dst_linesize,
  const uint8_t *src, ptrdiff_t src_linesize)
 {
-int offx, offy;
 NLMeansContext *s = ctx->priv;
 /* patches center points cover the whole research window so the patches
  * themselves overflow the research window */
@@ -433,8 +426,8 @@ static int nlmeans_plane(AVFilterContext *ctx, int w, int 
h, int p, int r,
 
 memset(s->wa, 0, s->wa_linesize * h * sizeof(*s->wa));
 
-for (offy = -r; offy <= r; offy++) {
-for (offx = -r; offx <= r; offx++) {
+for (int offy = -r; offy <= r; offy++) {
+for (int offx = -r; offx <= r; offx++) {
 if (offx || offy) {
 struct thread_data td = {
 .src  = src + offy*src_linesize + offx,
@@ -464,7 +457,6 @@ static int nlmeans_plane(AVFilterContext *ctx, int w, int 
h, int p, int r,
 
 static int filter_frame(AVFilterLink *inlink, AVFrame *in)
 {
-int i;
 AVFilterContext *ctx = inlink->dst;
 NLMeansContext *s = ctx->priv;
 AVFilterLink *outlink = ctx->outputs[0];
@@ -476,7 +468,7 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *in)
 }
 av_frame_copy_props(out, in);
 
-for (i = 0; i < s->nb_planes; i++) {
+for (int i = 0; i < s->nb_planes; i++) {
 const int w = i ? s->chroma_w  : inlink->w;
 const int h = i ? s->chroma_h  : inlink->h;
 const int p = i ? s->patch_hsize_uv: s->pa

Re: [FFmpeg-devel] about CRI HCA encoder

2021-10-24 Thread rin tec

Paul B Mahol  于2021年10月25日周一 上午12:43写道：

> On Sun, Oct 24, 2021 at 6:39 PM rin tec  wrote:
>
> > Hi,
> >
> > The hca codec, A audio codec developed by CRI.
> >
> > I noticed the ffmpeg project has had the hca decoder since March 2020.
> But
> > never have an encoder.
> >
> > So I want to know why the ffmpeg project doesn't have encoder? Is it
> > a patent or copyright problem? After all, The hca codec is a private
> codec
> > by CRI.
> >
> > Or, It is just that you don't have a developer or don't have time to
> > develop the encoder. If so, well, Maybe I can develop it... And I am
> > very happy to do so!
> >
>
> Lack of interest. Patches welcome.
>
>
> >
> > Look forward to your reply...
> >
> > Thanks!
> > ___
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
> >
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>

Thank you for your answer! I always thought it had some copyright problems
before...

So, Give me some time to complete it...
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] about CRI HCA encoder

2021-10-24 Thread Paul B Mahol

On Sun, Oct 24, 2021 at 6:39 PM rin tec  wrote:

> Hi,
>
> The hca codec, A audio codec developed by CRI.
>
> I noticed the ffmpeg project has had the hca decoder since March 2020. But
> never have an encoder.
>
> So I want to know why the ffmpeg project doesn't have encoder? Is it
> a patent or copyright problem? After all, The hca codec is a private codec
> by CRI.
>
> Or, It is just that you don't have a developer or don't have time to
> develop the encoder. If so, well, Maybe I can develop it... And I am
> very happy to do so!
>

Lack of interest. Patches welcome.


>
> Look forward to your reply...
>
> Thanks!
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] about CRI HCA encoder

2021-10-24 Thread rin tec

Hi,

The hca codec, A audio codec developed by CRI.

I noticed the ffmpeg project has had the hca decoder since March 2020. But
never have an encoder.

So I want to know why the ffmpeg project doesn't have encoder? Is it
a patent or copyright problem? After all, The hca codec is a private codec
by CRI.

Or, It is just that you don't have a developer or don't have time to
develop the encoder. If so, well, Maybe I can develop it... And I am
very happy to do so!

Look forward to your reply...

Thanks!
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v3] avcodec/libjxl: add JPEG XL decoding via libjxl

2021-10-24 Thread Michael Niedermayer

On Sat, Oct 23, 2021 at 09:18:46PM -0400, Leo Izen wrote:
> On Fri, Oct 22, 2021 at 5:38 PM Michael Niedermayer
>  wrote:
> > its 2 separate libraries they are often shiped by distributions in
> > separate packages. libavformat can depend on libavcodec but not the
> > other way around. libavcodec could get upgraded without libavformat
> > separate patches make it more natural to keep this clean and working
> > if a patch chages both its a hint something might be intermingled
> I see, I'll split it into a libavcodec patch that handles the codec
> only, and then a libavformat patch for the muxers/demuxers.
> 
> >Is there some public format spec of this ?
> I will have to reverse-engineer the libjxl code, since the spec is
> copyrighted and I can't access it. It might be available for a number
> of CHF but I can't afford it.
> 
> Speaking of the bitstream format, this is starting to get a lot of
> code in the prober, mostly because of how permissive JXL codestreams
> are. I'm thinking that it might make sense to write a parser for Jpeg
> XL in libavcodec, and then have libavformat's jpegxl prober just call
> some of that same code. Thoughts?

possible, it does add some extra complexity though

thx

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v3 0/3] introduce public AVIOContext::bytes_{read, written}

2021-10-24 Thread Jan Ekström

On Mon, Oct 18, 2021 at 3:47 PM Jan Ekström  wrote:
>
> Changes compared to v2:
> * Written was actually written_size, so it did not take into account any
>   writes after a seek-back. Thus an initial attempt at implementing
>   bytes_written was made.
>
> After a brief discussion with Michael on IRC, this seems to be the idea of a
> path forward that was agreed upon.
>
> 1. AVIOContext::written was supposed to be a private field [1], so move the
>value utilized internally to FFIOContext, and set the AVIOContext value
>from there.
> 2. Deprecate AVIOContext::written.
> 3. Introduce public AVIOContext::bytes_{read,written} statistics fields.
>
> I was not sure whether deprecation or straight-out removal was the right thing
> to do - since while the field was meant to be internal it was not marked as
> such in FFmpeg's merged state of the struct (which is why it did not get
> cleaned up into FFIOContext earlier) - but in order to not get stuck on that,
> I am now posting this with the full deprecation changes. This way (hopefully)
> this change will get more eyeballs and if someone thinks this could just be
> removed the patches can be changed to do that instead.
>
> [1] 
> http://git.videolan.org/?p=ffmpeg.git;a=commit;h=3f75e5116b900f1428aa13041fc7d6301bf1988a
>

Thanks for the comments, applied the set as
d39b58dc32b5fc7b480eeb9ef00a610732f02c2c
a5622ed16f8e22a80cecd8936799e61f61a74cd5
682bafdb12507ec8b049ecbbe2e48bf814927002

with one minor change, "point of truth" -> "source of truth" in the
first commit.

Jan
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions

[FFmpeg-devel] [PATCH 6/6][BROKEN] avfilter/vf_nlmeans: add x86 SIMD

[FFmpeg-devel] [PATCH 5/6] avfilter/vf_nlmeans: refactor line processing in preparation for x86 SIMD assembly

[FFmpeg-devel] [PATCH 4/6] avfilter/vf_nlmeans: avoid if () to help paralellization

[FFmpeg-devel] [PATCH 3/6] avfilter/vf_nlmeans: no need to print filter options at info level

[FFmpeg-devel] [PATCH 2/6] avfilter/vf_nlmeans: make access to pointer to lut faster

[FFmpeg-devel] [PATCH 1/6] avfilter/vf_nlmeans: use more friendlier 'for (int ...'

Re: [FFmpeg-devel] about CRI HCA encoder

Re: [FFmpeg-devel] about CRI HCA encoder

[FFmpeg-devel] about CRI HCA encoder

Re: [FFmpeg-devel] [PATCH v3] avcodec/libjxl: add JPEG XL decoding via libjxl

Re: [FFmpeg-devel] [PATCH v3 0/3] introduce public AVIOContext::bytes_{read, written}

12 matches

Site Navigation

Mail list logo

Footer information