Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-17 Thread Mathias Krause
On 13.11.2010, 00:25 Mathias Krause wrote:
 On 12.11.2010, 08:34 Huang Ying wrote:
 On Fri, 2010-11-12 at 15:30 +0800, Mathias Krause wrote:
 On 12.11.2010, 01:33 Huang Ying wrote:
 Why the improvement of ECB is so small? I can not understand it. It
 should be as big as CBC.
 
 I don't know why the ECB variant is so slow compared to the other variants.
 But it is so even for the current x86-64 version. See the above values for
 x86-64 (old). I setup dm-crypt for this test like this:
 # cryptsetup -c aes-ecb-plain -d /dev/urandom create cfs /dev/loop0
 
 What where the numbers you measured in your tests while developing the
 x86-64 version?
 
 Can't remember the number. Do you have interest to dig into the issue?
 
 I looked at /proc/crypto while doing the tests again and noticed that ECB
 isn't handled using cryptd, while all other modes, e.g. CBC and CTR, are.
 The reason for that seems to be that for ECB, and only for ECB, the kernel
 is using the synchronous block algorithm instead of the asynchronous one.
 So the question is: Why is the ECB variant handled using the synchronous
 cipher -- because of the missing iv handling in this mode?

Herbert, any idea why this is the case?

Regards,
Mathias

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-12 Thread Mathias Krause
On 12.11.2010, 08:34 Huang Ying wrote:
On Fri, 2010-11-12 at 15:30 +0800, Mathias Krause wrote:
 On 12.11.2010, 01:33 Huang Ying wrote:
 Why the improvement of ECB is so small? I can not understand it. It
 should be as big as CBC.
 
 I don't know why the ECB variant is so slow compared to the other variants.
 But it is so even for the current x86-64 version. See the above values for
 x86-64 (old). I setup dm-crypt for this test like this:
 # cryptsetup -c aes-ecb-plain -d /dev/urandom create cfs /dev/loop0
 
 What where the numbers you measured in your tests while developing the
 x86-64 version?
 
 Can't remember the number. Do you have interest to dig into the issue?

I looked at /proc/crypto while doing the tests again and noticed that ECB
isn't handled using cryptd, while all other modes, e.g. CBC and CTR, are.
The reason for that seems to be that for ECB, and only for ECB, the kernel
is using the synchronous block algorithm instead of the asynchronous one.
So the question is: Why is the ECB variant handled using the synchronous
cipher -- because of the missing iv handling in this mode?

Best regards,
Mathias

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-11 Thread Mathias Krause
Hello Huang Ying,

On 03.11.2010, 23:27 Huang Ying wrote:
 On Wed, 2010-11-03 at 14:14 -0700, Mathias Krause wrote:
 The AES-NI instructions are also available in legacy mode so the 32-bit
 architecture may profit from those, too.
 
 To illustrate the performance gain here's a short summary of the tcrypt
 speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
 implementations:
 
 x86:  i568   aes-ni   delta
 256 bit, 8kB blocks, ECB:  125.94 MB/s  187.09 MB/s  +48.6%
 
 Which method do you used for speed testing?
 
 modprobe tcrypt mode=200 sec=?
 
 That actually does not work very well for AES-NI. Because AES-NI
 blkcipher is tested in synchronous mode, and in that mode,
 kernel_fpu_begin/end() must be called for every block, and
 kernel_fpu_begin/end() is quite slow. At the same time, some further
 optimization for AES-NI can not be tested (such as ecb-aes-aesni
 driver) in that mode, because they are only available in asynchronous
 mode.
 
 When developing AES-NI for x86_64, I uses dm-crypt + AES-NI for speed
 testing, where AES-NI blkcipher will be tested in asynchronous mode, and
 kernel_fpu_begin/end() is called for every page. Can you use that to
 test?
 
 Or you can add test_acipher_speed (similar with test_ahash_speed) to
 test cipher in asynchronous mode.

here are the numbers for dm-crypt. I run the test again on the Core i7
M620, 2.67GHz. During the test I noticed that not porting the CBC
variant to x86 was a bad idea so I did that too and got pretty nice
numbers (see v3 vs. v4 of the patch).

All test were run five times in a row using a 256 bit key and doing i/o
to the block device in chunks of 1MB. The numbers are MB/s.

x86 (i586 variant):
1. run  2. run  3. run  4. run  5. runmean
ECB:  93.993.994.093.593.893.8
CBC:  84.984.884.984.984.884.8
XTS: 108.2   108.3   109.6   108.3   108.9   108.6
LRW: 105.0   105.0   105.1   105.1   105.1   105.0

x86 (AES-NI), v3 of the patch:
1. run  2. run  3. run  4. run  5. runmean
ECB: 124.8   120.8   124.5   120.6   124.5   123.0
CBC: 112.6   109.6   112.6   110.7   109.4   110.9 
XTS: 221.6   221.1   220.9   223.5   224.4   222.3
LRW: 206.2   209.7   207.4   203.7   209.3   207.2

x86 (AES-NI), v4 of the patch:
1. run  2. run  3. run  4. run  5. runmean
ECB: 122.5   121.2   121.6   125.7   125.5   123.3
CBC: 259.5   259.2   261.2   264.0   267.6   262.3 
XTS: 225.1   230.7   220.6   217.9   216.3   222.1
LRW: 202.7   202.8   210.6   208.9   202.7   205.5

Comparing the values for the CBC variant between v3 and v4 of the patch
shows that porting the CBC variant to x86 more then doubled the
performance so the little bit ugly #ifdefed code is worth the effort.

x86-64 (old):
1. run  2. run  3. run  4. run  5. runmean
ECB: 121.4   120.9   121.1   121.2   120.9   121.1
CBC: 282.5   286.3   281.5   282.0   294.5   285.3
XTS: 263.6   260.3   263.0   267.0   264.6   263.7
LRW: 249.6   249.8   250.5   253.4   252.2   251.1

x86-64 (new):
1. run  2. run  3. run  4. run  5. runmean
ECB: 122.1   122.0   122.0   127.0   121.9   123.0
CBC: 291.2   286.2   295.6   291.4   289.9   290.8
XTS: 263.3   264.4   264.5   264.2   270.4   265.3
LRW: 254.9   252.3   253.6   258.2   257.5   255.3

Comparing the mean values gives us:

x86: i586   aes-nidelta
ECB: 93.8123.3   +31.4%
CBC: 84.8262.3  +209.3%
LRW:108.6222.1  +104.5%
XTS:105.0205.5   +95.7%

x86-64:   old  newdelta
ECB:121.1123.0+1.5%
CBC:285.3290.8+1.9%
LRW:263.7265.3+0.6%
XTS:251.1255.3+1.7%

The improvement for the old vs. the new x86-64 version is not as
drastically as for the synchronous variant (see the tcrypt tests in the
previous email), but nevertheless an improvement. The improvement for
the x86 case, albeit, should be noticeable. It's almost as fast as the
x86-64 version.

I'll post the new version of the patch in a follow-up email.


Regards,
Mathias

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-11 Thread Huang Ying
Hi, Mathias,

On Fri, 2010-11-12 at 06:18 +0800, Mathias Krause wrote:
 All test were run five times in a row using a 256 bit key and doing i/o
 to the block device in chunks of 1MB. The numbers are MB/s.
 
 x86 (i586 variant):
 1. run  2. run  3. run  4. run  5. runmean
 ECB:  93.993.994.093.593.893.8
 CBC:  84.984.884.984.984.884.8
 XTS: 108.2   108.3   109.6   108.3   108.9   108.6
 LRW: 105.0   105.0   105.1   105.1   105.1   105.0
 
 x86 (AES-NI), v3 of the patch:
 1. run  2. run  3. run  4. run  5. runmean
 ECB: 124.8   120.8   124.5   120.6   124.5   123.0
 CBC: 112.6   109.6   112.6   110.7   109.4   110.9 
 XTS: 221.6   221.1   220.9   223.5   224.4   222.3
 LRW: 206.2   209.7   207.4   203.7   209.3   207.2
 
 x86 (AES-NI), v4 of the patch:
 1. run  2. run  3. run  4. run  5. runmean
 ECB: 122.5   121.2   121.6   125.7   125.5   123.3
 CBC: 259.5   259.2   261.2   264.0   267.6   262.3 
 XTS: 225.1   230.7   220.6   217.9   216.3   222.1
 LRW: 202.7   202.8   210.6   208.9   202.7   205.5
 
 Comparing the values for the CBC variant between v3 and v4 of the patch
 shows that porting the CBC variant to x86 more then doubled the
 performance so the little bit ugly #ifdefed code is worth the effort.
 
 x86-64 (old):
 1. run  2. run  3. run  4. run  5. runmean
 ECB: 121.4   120.9   121.1   121.2   120.9   121.1
 CBC: 282.5   286.3   281.5   282.0   294.5   285.3
 XTS: 263.6   260.3   263.0   267.0   264.6   263.7
 LRW: 249.6   249.8   250.5   253.4   252.2   251.1
 
 x86-64 (new):
 1. run  2. run  3. run  4. run  5. runmean
 ECB: 122.1   122.0   122.0   127.0   121.9   123.0
 CBC: 291.2   286.2   295.6   291.4   289.9   290.8
 XTS: 263.3   264.4   264.5   264.2   270.4   265.3
 LRW: 254.9   252.3   253.6   258.2   257.5   255.3
 
 Comparing the mean values gives us:
 
 x86: i586   aes-nidelta
 ECB: 93.8123.3   +31.4%

Why the improvement of ECB is so small? I can not understand it. It
should be as big as CBC.

Best Regards,
Huang Ying

 CBC: 84.8262.3  +209.3%
 LRW:108.6222.1  +104.5%
 XTS:105.0205.5   +95.7%
 
 x86-64:   old  newdelta
 ECB:121.1123.0+1.5%
 CBC:285.3290.8+1.9%
 LRW:263.7265.3+0.6%
 XTS:251.1255.3+1.7%
 
 The improvement for the old vs. the new x86-64 version is not as
 drastically as for the synchronous variant (see the tcrypt tests in the
 previous email), but nevertheless an improvement. The improvement for
 the x86 case, albeit, should be noticeable. It's almost as fast as the
 x86-64 version.
 
 I'll post the new version of the patch in a follow-up email.
 
 
 Regards,
 Mathias
 


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-11 Thread Mathias Krause
On 12.11.2010, 01:33 Huang Ying wrote:
 Hi, Mathias,
 
 On Fri, 2010-11-12 at 06:18 +0800, Mathias Krause wrote:
 All test were run five times in a row using a 256 bit key and doing i/o
 to the block device in chunks of 1MB. The numbers are MB/s.
 
 x86 (i586 variant):
1. run  2. run  3. run  4. run  5. runmean
 ECB:  93.993.994.093.593.893.8
 CBC:  84.984.884.984.984.884.8
 XTS: 108.2   108.3   109.6   108.3   108.9   108.6
 LRW: 105.0   105.0   105.1   105.1   105.1   105.0
 
 x86 (AES-NI), v3 of the patch:
1. run  2. run  3. run  4. run  5. runmean
 ECB: 124.8   120.8   124.5   120.6   124.5   123.0
 CBC: 112.6   109.6   112.6   110.7   109.4   110.9 
 XTS: 221.6   221.1   220.9   223.5   224.4   222.3
 LRW: 206.2   209.7   207.4   203.7   209.3   207.2
 
 x86 (AES-NI), v4 of the patch:
1. run  2. run  3. run  4. run  5. runmean
 ECB: 122.5   121.2   121.6   125.7   125.5   123.3
 CBC: 259.5   259.2   261.2   264.0   267.6   262.3 
 XTS: 225.1   230.7   220.6   217.9   216.3   222.1
 LRW: 202.7   202.8   210.6   208.9   202.7   205.5
 
 Comparing the values for the CBC variant between v3 and v4 of the patch
 shows that porting the CBC variant to x86 more then doubled the
 performance so the little bit ugly #ifdefed code is worth the effort.
 
 x86-64 (old):
1. run  2. run  3. run  4. run  5. runmean
 ECB: 121.4   120.9   121.1   121.2   120.9   121.1
 CBC: 282.5   286.3   281.5   282.0   294.5   285.3
 XTS: 263.6   260.3   263.0   267.0   264.6   263.7
 LRW: 249.6   249.8   250.5   253.4   252.2   251.1
 
 x86-64 (new):
1. run  2. run  3. run  4. run  5. runmean
 ECB: 122.1   122.0   122.0   127.0   121.9   123.0
 CBC: 291.2   286.2   295.6   291.4   289.9   290.8
 XTS: 263.3   264.4   264.5   264.2   270.4   265.3
 LRW: 254.9   252.3   253.6   258.2   257.5   255.3
 
 Comparing the mean values gives us:
 
 x86: i586   aes-nidelta
 ECB: 93.8123.3   +31.4%
 
 Why the improvement of ECB is so small? I can not understand it. It
 should be as big as CBC.

I don't know why the ECB variant is so slow compared to the other variants.
But it is so even for the current x86-64 version. See the above values for
x86-64 (old). I setup dm-crypt for this test like this:
# cryptsetup -c aes-ecb-plain -d /dev/urandom create cfs /dev/loop0

What where the numbers you measured in your tests while developing the
x86-64 version?

Best regards,
Mathias

 
 Best Regards,
 Huang Ying
 
 CBC: 84.8262.3  +209.3%
 LRW:108.6222.1  +104.5%
 XTS:105.0205.5   +95.7%
 
 x86-64:   old  newdelta
 ECB:121.1123.0+1.5%
 CBC:285.3290.8+1.9%
 LRW:263.7265.3+0.6%
 XTS:251.1255.3+1.7%
 
 The improvement for the old vs. the new x86-64 version is not as
 drastically as for the synchronous variant (see the tcrypt tests in the
 previous email), but nevertheless an improvement. The improvement for
 the x86 case, albeit, should be noticeable. It's almost as fast as the
 x86-64 version.
 
 I'll post the new version of the patch in a follow-up email.
 
 
 Regards,
 Mathias
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-11 Thread Huang Ying
On Fri, 2010-11-12 at 15:30 +0800, Mathias Krause wrote:
 On 12.11.2010, 01:33 Huang Ying wrote:
  Hi, Mathias,
  
  On Fri, 2010-11-12 at 06:18 +0800, Mathias Krause wrote:
  All test were run five times in a row using a 256 bit key and doing i/o
  to the block device in chunks of 1MB. The numbers are MB/s.
  
  x86 (i586 variant):
 1. run  2. run  3. run  4. run  5. runmean
  ECB:  93.993.994.093.593.893.8
  CBC:  84.984.884.984.984.884.8
  XTS: 108.2   108.3   109.6   108.3   108.9   108.6
  LRW: 105.0   105.0   105.1   105.1   105.1   105.0
  
  x86 (AES-NI), v3 of the patch:
 1. run  2. run  3. run  4. run  5. runmean
  ECB: 124.8   120.8   124.5   120.6   124.5   123.0
  CBC: 112.6   109.6   112.6   110.7   109.4   110.9 
  XTS: 221.6   221.1   220.9   223.5   224.4   222.3
  LRW: 206.2   209.7   207.4   203.7   209.3   207.2
  
  x86 (AES-NI), v4 of the patch:
 1. run  2. run  3. run  4. run  5. runmean
  ECB: 122.5   121.2   121.6   125.7   125.5   123.3
  CBC: 259.5   259.2   261.2   264.0   267.6   262.3 
  XTS: 225.1   230.7   220.6   217.9   216.3   222.1
  LRW: 202.7   202.8   210.6   208.9   202.7   205.5
  
  Comparing the values for the CBC variant between v3 and v4 of the patch
  shows that porting the CBC variant to x86 more then doubled the
  performance so the little bit ugly #ifdefed code is worth the effort.
  
  x86-64 (old):
 1. run  2. run  3. run  4. run  5. runmean
  ECB: 121.4   120.9   121.1   121.2   120.9   121.1
  CBC: 282.5   286.3   281.5   282.0   294.5   285.3
  XTS: 263.6   260.3   263.0   267.0   264.6   263.7
  LRW: 249.6   249.8   250.5   253.4   252.2   251.1
  
  x86-64 (new):
 1. run  2. run  3. run  4. run  5. runmean
  ECB: 122.1   122.0   122.0   127.0   121.9   123.0
  CBC: 291.2   286.2   295.6   291.4   289.9   290.8
  XTS: 263.3   264.4   264.5   264.2   270.4   265.3
  LRW: 254.9   252.3   253.6   258.2   257.5   255.3
  
  Comparing the mean values gives us:
  
  x86: i586   aes-nidelta
  ECB: 93.8123.3   +31.4%
  
  Why the improvement of ECB is so small? I can not understand it. It
  should be as big as CBC.
 
 I don't know why the ECB variant is so slow compared to the other variants.
 But it is so even for the current x86-64 version. See the above values for
 x86-64 (old). I setup dm-crypt for this test like this:
 # cryptsetup -c aes-ecb-plain -d /dev/urandom create cfs /dev/loop0
 
 What where the numbers you measured in your tests while developing the
 x86-64 version?

Can't remember the number. Do you have interest to dig into the issue?

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-11 Thread Mathias Krause
On 12.11.2010, 08:34 Huang Ying wrote:
On Fri, 2010-11-12 at 15:30 +0800, Mathias Krause wrote:
 On 12.11.2010, 01:33 Huang Ying wrote:
 Hi, Mathias,
 
 On Fri, 2010-11-12 at 06:18 +0800, Mathias Krause wrote:
 All test were run five times in a row using a 256 bit key and doing i/o
 to the block device in chunks of 1MB. The numbers are MB/s.
 
 x86 (i586 variant):
   1. run  2. run  3. run  4. run  5. runmean
 ECB:  93.993.994.093.593.893.8
 CBC:  84.984.884.984.984.884.8
 XTS: 108.2   108.3   109.6   108.3   108.9   108.6
 LRW: 105.0   105.0   105.1   105.1   105.1   105.0
 
 x86 (AES-NI), v3 of the patch:
   1. run  2. run  3. run  4. run  5. runmean
 ECB: 124.8   120.8   124.5   120.6   124.5   123.0
 CBC: 112.6   109.6   112.6   110.7   109.4   110.9 
 XTS: 221.6   221.1   220.9   223.5   224.4   222.3
 LRW: 206.2   209.7   207.4   203.7   209.3   207.2
 
 x86 (AES-NI), v4 of the patch:
   1. run  2. run  3. run  4. run  5. runmean
 ECB: 122.5   121.2   121.6   125.7   125.5   123.3
 CBC: 259.5   259.2   261.2   264.0   267.6   262.3 
 XTS: 225.1   230.7   220.6   217.9   216.3   222.1
 LRW: 202.7   202.8   210.6   208.9   202.7   205.5
 
 Comparing the values for the CBC variant between v3 and v4 of the patch
 shows that porting the CBC variant to x86 more then doubled the
 performance so the little bit ugly #ifdefed code is worth the effort.
 
 x86-64 (old):
   1. run  2. run  3. run  4. run  5. runmean
 ECB: 121.4   120.9   121.1   121.2   120.9   121.1
 CBC: 282.5   286.3   281.5   282.0   294.5   285.3
 XTS: 263.6   260.3   263.0   267.0   264.6   263.7
 LRW: 249.6   249.8   250.5   253.4   252.2   251.1
 
 x86-64 (new):
   1. run  2. run  3. run  4. run  5. runmean
 ECB: 122.1   122.0   122.0   127.0   121.9   123.0
 CBC: 291.2   286.2   295.6   291.4   289.9   290.8
 XTS: 263.3   264.4   264.5   264.2   270.4   265.3
 LRW: 254.9   252.3   253.6   258.2   257.5   255.3
 
 Comparing the mean values gives us:
 
 x86: i586   aes-nidelta
 ECB: 93.8123.3   +31.4%
 
 Why the improvement of ECB is so small? I can not understand it. It
 should be as big as CBC.
 
 I don't know why the ECB variant is so slow compared to the other variants.
 But it is so even for the current x86-64 version. See the above values for
 x86-64 (old). I setup dm-crypt for this test like this:
 # cryptsetup -c aes-ecb-plain -d /dev/urandom create cfs /dev/loop0
 
 What where the numbers you measured in your tests while developing the
 x86-64 version?
 
 Can't remember the number. Do you have interest to dig into the issue?

Sure. Increasing performance is always a good thing to do. :)

Best regards,
Mathias--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-04 Thread Mathias Krause
On 03.11.2010, 23:27 Huang Ying wrote:
 On Wed, 2010-11-03 at 14:14 -0700, Mathias Krause wrote:
 The AES-NI instructions are also available in legacy mode so the 32-bit
 architecture may profit from those, too.
 
 To illustrate the performance gain here's a short summary of the tcrypt
 speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
 implementations:
 
 x86:  i568   aes-ni   delta
 256 bit, 8kB blocks, ECB:  125.94 MB/s  187.09 MB/s  +48.6%
 
 Which method do you used for speed testing?
 
 modprobe tcrypt mode=200 sec=?

Yes. I used: modprobe tcrypt mode=200 sec=1

 That actually does not work very well for AES-NI. Because AES-NI
 blkcipher is tested in synchronous mode, and in that mode,
 kernel_fpu_begin/end() must be called for every block, and
 kernel_fpu_begin/end() is quite slow.

That's what I figured, too. Can this slowdown be avoided by saving and 
restoring the used FPU registers within the assembler implementation or 
would this be even slower?

 At the same time, some further
 optimization for AES-NI can not be tested (such as ecb-aes-aesni
 driver) in that mode, because they are only available in asynchronous
 mode.

After finding the bug in the second version of the patch I noticed this, 
too.

 When developing AES-NI for x86_64, I uses dm-crypt + AES-NI for speed
 testing, where AES-NI blkcipher will be tested in asynchronous mode, and
 kernel_fpu_begin/end() is called for every page. Can you use that to
 test?

But wouldn't this be even slower than the above measurement? I took the 
results for 8kB blocks and a page would only be 4kB ... well, depends on 
what kind of pages you took. IIRC x86-64 not only supports 2MB but also 
1GB pages ;)

 Or you can add test_acipher_speed (similar with test_ahash_speed) to
 test cipher in asynchronous mode.

Maybe I'll try this approach, since it looks like just a minor 
modification of the tcrypt module.
Thanks for the hints!

Best regards,
Mathias

 
 Best Regards,
 Huang Ying
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] x86, crypto: ported aes-ni implementation to x86

2010-11-03 Thread Huang Ying
On Wed, 2010-11-03 at 14:14 -0700, Mathias Krause wrote:
 The AES-NI instructions are also available in legacy mode so the 32-bit
 architecture may profit from those, too.
 
 To illustrate the performance gain here's a short summary of the tcrypt
 speed test on a Core i7 M620 running at 2.67GHz comparing both assembler
 implementations:
 
 x86:  i568   aes-ni   delta
 256 bit, 8kB blocks, ECB:  125.94 MB/s  187.09 MB/s  +48.6%

Which method do you used for speed testing?

modprobe tcrypt mode=200 sec=?

That actually does not work very well for AES-NI. Because AES-NI
blkcipher is tested in synchronous mode, and in that mode,
kernel_fpu_begin/end() must be called for every block, and
kernel_fpu_begin/end() is quite slow. At the same time, some further
optimization for AES-NI can not be tested (such as ecb-aes-aesni
driver) in that mode, because they are only available in asynchronous
mode.

When developing AES-NI for x86_64, I uses dm-crypt + AES-NI for speed
testing, where AES-NI blkcipher will be tested in asynchronous mode, and
kernel_fpu_begin/end() is called for every page. Can you use that to
test?

Or you can add test_acipher_speed (similar with test_ahash_speed) to
test cipher in asynchronous mode.

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html