On Thu, 2010-11-04 at 00:38 -0700, Mathias Krause wrote:
> On 03.11.2010, 23:27 Huang Ying wrote:
> > On Wed, 2010-11-03 at 14:14 -0700, Mathias Krause wrote:
> >> The AES-NI instructions are also available in legacy mode so the 32-bit
> >> architecture may profit from those, too.
> >> 
> >> To illustrate the performance gain here's a short summary of the tcrypt
> >> speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
> >> implementations:
> >> 
> >> x86:                              i568       aes-ni   delta
> >> 256 bit, 8kB blocks, ECB:  125.94 MB/s  187.09 MB/s  +48.6%
> > 
> > Which method do you used for speed testing?
> > 
> > modprobe tcrypt mode=200 sec=<?>
> 
> Yes. I used: modprobe tcrypt mode=200 sec=1
> 
> > That actually does not work very well for AES-NI. Because AES-NI
> > blkcipher is tested in synchronous mode, and in that mode,
> > kernel_fpu_begin/end() must be called for every block, and
> > kernel_fpu_begin/end() is quite slow.
> 
> That's what I figured, too. Can this slowdown be avoided by saving and 
> restoring the used FPU registers within the assembler implementation or 
> would this be even slower?

That is a customized version of kernel_fpu_begin/end(), I think the x86
maintainer will not like it. And the benefit may be small too.

> > At the same time, some further
> > optimization for AES-NI can not be tested (such as "ecb-aes-aesni"
> > driver) in that mode, because they are only available in asynchronous
> > mode.
> 
> After finding the bug in the second version of the patch I noticed this, 
> too.
> 
> > When developing AES-NI for x86_64, I uses dm-crypt + AES-NI for speed
> > testing, where AES-NI blkcipher will be tested in asynchronous mode, and
> > kernel_fpu_begin/end() is called for every page. Can you use that to
> > test?
> 
> But wouldn't this be even slower than the above measurement? I took the 
> results for 8kB blocks and a page would only be 4kB ... well, depends on 
> what kind of pages you took. IIRC x86-64 not only supports 2MB but also 
> 1GB pages ;)

There is other difference between them. In synchronous mode
kernel_fpu_begin/end() is called for every block, while in asynchronous
mode and dm-crypt, kernel_fpu_begin/end() is called for every page. So
although the block size is smaller, the result will be better.

> > Or you can add test_acipher_speed (similar with test_ahash_speed) to
> > test cipher in asynchronous mode.
> 
> Maybe I'll try this approach, since it looks like just a minor 
> modification of the tcrypt module.

Thanks!

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to