Hi,

the attached patch is a low-hanging fruit.

I think the code using the computed values could be improved (eg you
probably need half the GPRs to store results and you can probably
shuffle more efficiently data), but this requires more effort.

I'm mostly submitting it because it still applies, and I can't really
spend more time on it.

-- 
Christophe
From 57819727586c186bfea733a8f06eead22ac6a1f2 Mon Sep 17 00:00:00 2001
From: Christophe Gisquet <christophe.gisq...@gmail.com>
Date: Wed, 23 Jul 2014 23:21:20 +0200
Subject: [PATCH 08/13] x86: hevc_deblock: remove unnecessary masking

The unpacks/shuffles later on makes it unnecessary.

Before:
1508 decicycles in h, 2096759 runs, 393 skips
2512 decicycles in v, 2095422 runs, 1730 skips

After:
1477 decicycles in h, 2096745 runs, 407 skips
2484 decicycles in v, 2095297 runs, 1855 skips
---
 libavcodec/x86/hevc_deblock.asm | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/libavcodec/x86/hevc_deblock.asm b/libavcodec/x86/hevc_deblock.asm
index 89c0f9b..7fa0803 100644
--- a/libavcodec/x86/hevc_deblock.asm
+++ b/libavcodec/x86/hevc_deblock.asm
@@ -355,19 +355,15 @@ ALIGN 16
     psrld            m8, 16
     paddw            m8, m10
     movd            r7d, m8
-    and              r7, 0xffff; 1dp0 + 1dp3
     pshufd           m8, m8, 0x4E
     movd            r8d, m8
-    and              r8, 0xffff; 0dp0 + 0dp3
 
     pshufd           m8, m11, 0x31
     psrld            m8, 16
     paddw            m8, m11
     movd            r9d, m8
-    and              r9, 0xffff; 1dq0 + 1dq3
     pshufd           m8, m8, 0x4E
     movd           r10d, m8
-    and             r10, 0xffff; 0dq0 + 0dq3
     ; end calc for weak filter
 
     ; filtering mask
-- 
1.9.2.msysgit.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to