Re: [PATCH 2/4] Accelerated CRC T10 DIF computation with PCLMULQDQ instruction

Tim Chen Wed, 17 Apr 2013 11:20:21 -0700

On Wed, 2013-04-17 at 20:58 +0300, Jussi Kivilinna wrote:
> On 16.04.2013 19:20, Tim Chen wrote:
> > This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ
> > instructions.  Details discussing the implementation can be found in the
> > paper:
> > 
> > "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
> > URL: http://download.intel.com/design/intarch/papers/323102.pdf
> 
> URL does not work.


Thanks for catching this. Will update.

> 
> > 
> > Signed-off-by: Tim Chen <tim.c.c...@linux.intel.com>
> > Tested-by: Keith Busch <keith.bu...@intel.com>
> > ---
> >  arch/x86/crypto/crct10dif-pcl-asm_64.S | 659 
> > +++++++++++++++++++++++++++++++++
> >  1 file changed, 659 insertions(+)
> >  create mode 100644 arch/x86/crypto/crct10dif-pcl-asm_64.S
> <snip>
> > +
> > +   # Allocate Stack Space
> > +   mov     %rsp, %rcx
> > +   sub     $16*10, %rsp
> > +   and     $~(0x20 - 1), %rsp
> > +
> > +   # push the xmm registers into the stack to maintain
> > +   movdqa %xmm10, 16*2(%rsp)
> > +   movdqa %xmm11, 16*3(%rsp)
> > +   movdqa %xmm8 , 16*4(%rsp)
> > +   movdqa %xmm12, 16*5(%rsp)
> > +   movdqa %xmm13, 16*6(%rsp)
> > +   movdqa %xmm6,  16*7(%rsp)
> > +   movdqa %xmm7,  16*8(%rsp)
> > +   movdqa %xmm9,  16*9(%rsp)
> 
> You don't need to store (and restore) these, as 'crc_t10dif_pcl' is called 
> between kernel_fpu_begin/_end.

That's true.  Will skip the xmm save/restore in update to the patch.

> 
> > +
> > +
> > +   # check if smaller than 256
> > +   cmp     $256, arg3
> > +
> <snip>
> > +_cleanup:
> > +   # scale the result back to 16 bits
> > +   shr     $16, %eax
> > +   movdqa  16*2(%rsp), %xmm10
> > +   movdqa  16*3(%rsp), %xmm11
> > +   movdqa  16*4(%rsp), %xmm8
> > +   movdqa  16*5(%rsp), %xmm12
> > +   movdqa  16*6(%rsp), %xmm13
> > +   movdqa  16*7(%rsp), %xmm6
> > +   movdqa  16*8(%rsp), %xmm7
> > +   movdqa  16*9(%rsp), %xmm9
> 
> Registers are overwritten by kernel_fpu_end.
> 
> > +   mov     %rcx, %rsp
> > +   ret
> > +ENDPROC(crc_t10dif_pcl)
> > +
> 
> You should move ENDPROC at end of the full function.
> 
> > +########################################################################
> > +
> > +.align 16
> > +_less_than_128:
> > +
> > +   # check if there is enough buffer to be able to fold 16B at a time
> > +   cmp     $32, arg3
> <snip>
> > +   movdqa  (%rsp), %xmm7
> > +   pshufb  %xmm11, %xmm7
> > +   pxor    %xmm0 , %xmm7   # xor the initial crc value
> > +
> > +   psrldq  $7, %xmm7
> > +
> > +   jmp     _barrett
> 
> Move ENDPROC here.
> 

Will do.

Tim



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/4] Accelerated CRC T10 DIF computation with PCLMULQDQ instruction

Reply via email to