Hi Alex,

    Thanks for your reply.
    I tested your correction. The result is correct.
    The C language will cause this function(csum_tcpudp_nofold) become
176 MIPS instructions. The assemble code is 150 MIPS instruction.
    If the MIPS CPU is running as 1GHz, C language cause more 60 nano
seconds during send/receive each tcp/udp packet. I'm not sure whether
it will cause any negative result if the frequency of CPU was lower.
MIPS arch is usually used in networking equipments.
    I think Ralf's correction is better.

    - Mark

Signed-off-by: Ralf Baechle <r...@linux-mips.org>

 arch/mips/include/asm/checksum.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/mips/include/asm/checksum.h b/arch/mips/include/asm/checksum.h
index 5c585c5..08b6ba3 100644
--- a/arch/mips/include/asm/checksum.h
+++ b/arch/mips/include/asm/checksum.h
@@ -186,7 +186,9 @@ static inline __wsum csum_tcpudp_nofold(__be32 saddr,
        "       daddu   %0, %4          \n"
        "       dsll32  $1, %0, 0       \n"
        "       daddu   %0, $1          \n"
+       "       sltu    $1, %0, $1      \n"
        "       dsra32  %0, %0, 0       \n"
+       "       daddu   %0, $1          \n"
 #endif
        "       .set    pop"
        : "=r" (sum)


2017-01-26 2:13 GMT+08:00 Alexander Duyck <alexander.du...@gmail.com>:
> On Tue, Jan 24, 2017 at 8:35 PM, Mark Zhang <bomb.zh...@gmail.com> wrote:
>> If the input parameters as saddr = 0xc0a8fd60,daddr = 0xc0a8fda1,len =
>> 80, proto = 17, sum =0x7eae049d.
>> The correct result should be 1, but original function return 0.
>>
>> Attached the correction patch.
>
> I've copied your patch here:
>
> From 52e265f7fe0acf9a6e9c4346e1fe6fa994aa00b6 Mon Sep 17 00:00:00 2001
> From: qzhang <qin.2.zh...@nsn.com>
> Date: Wed, 25 Jan 2017 12:25:25 +0800
> Subject: [PATCH] Fixed the mips 64bits checksum error -- csum_tcpudp_nofold
>
> ---
>  arch/mips/include/asm/checksum.h |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/arch/mips/include/asm/checksum.h 
> b/arch/mips/include/asm/checksum.h
> index 7749daf..0e351c5 100644
> --- a/arch/mips/include/asm/checksum.h
> +++ b/arch/mips/include/asm/checksum.h
> @@ -184,6 +184,10 @@ static inline __wsum csum_tcpudp_nofold(__be32
> saddr, __be32 daddr,
>   " daddu %0, %2 \n"
>   " daddu %0, %3 \n"
>   " daddu %0, %4 \n"
> + " dsrl32  $1, %0, 0 \n"
> + " dsll32 %0, %0, 0 \n"
> + " dsrl32 %0, %0, 0 \n"
> + " daddu   %0, $1 \n"
>   " dsll32 $1, %0, 0 \n"
>   " daddu %0, $1 \n"
>   " dsra32 %0, %0, 0 \n"
>
> I agree there does appear to be a bug in the code, and my
> understanding of MIPS assembly is limited, but I don't think your
> patch truly fixes it.  From what I can understand it seems like you
> would just be shifting the register called out at %0 past 64 bits
> which would just zero it out.
>
> Below is the snippet you are updating:
>
>     #ifdef CONFIG_64BIT
>             "       daddu   %0, %2          \n"
>             "       daddu   %0, %3          \n"
>             "       daddu   %0, %4          \n"
>             "       dsll32  $1, %0, 0       \n"
>             "       daddu   %0, $1          \n"
>             "       dsra32  %0, %0, 0       \n"
>     #endif
>
> From what I can tell the issue is that the dsll32 really needs to be
> replaced with some sort of rotation type call instead of a shift, but
> it looks like MIPS doesn't have a rotation instruction.  We need to
> add the combination of a shift left by 32, to a shift right 32, and
> then add that value back to the original register.  Then we will get
> the correct result in the upper 32 bits.
>
> I'm not even sure it is necessary to use inline assembler.  You could
> probably just use the inline assembler for the 32b and change the 64b
> approach over to using C.  The code for it would be pretty simple:
>     unsigned long res = (__force unsigned long)sum;
>
>     res += (__force unsigned long)daddr;
>     res += (__force unsigned long)saddr;
> #ifdef __MIPSEL__
>     res += (proto + len) << 8;
> #else
>     res += proto + len;
> #endif
>     res += (res << 32) | (res >> 32);
>
>     return (__force __wsum)(res >> 32);
>
> That would probably be enough to fix the issue and odds are it should
> generate assembly code very similar to what was already there.
>
> - Alex

Reply via email to