Re: [lng-odp] Suspected SPAM - Re: IPsec and crypto performance and OpenSSL

2017-12-13 Thread Dmitry Eremin-Solenikov
Hello

On 12 December 2017 at 12:38, Peltonen, Janne (Nokia - FI/Espoo)
 wrote:
>
>> So, I'd suggest to preallocate Open SSL (per thread) context memory in 
>> global_init(). I
>> guess context allocation depends on algorithm, etc config, but we could e.g. 
>> pre-allocate
>> the most obvious ones (e.g. AES+SHA1, or AES-GCM) and leave all others as 
>> they are today.
>> A balance between simple solution, process support and better performance 
>> for the common
>> case.
>
> That would help but since the same context would be shared by
> many crypto sessions the keys would have to be configured in the
> context in every crypto op.
>
> The current code looks like this (in every crypto op):
>
> ctx = EVP_CIPHER_CTX_new();
> EVP_EncryptInit_ex(ctx, session->cipher.evp_cipher, NULL,
>session->cipher.key_data, NULL);
> EVP_EncryptInit_ex(ctx, NULL, NULL, NULL, iv_ptr);
> EVP_CIPHER_CTX_set_padding(ctx, 0);
>
> ret = internal_encrypt(ctx, pkt, param);
>
> EVP_CIPHER_CTX_free(ctx);
>
> I tried this, with big speedup (and broken process support):
>
> ctx = session->perthread[odp_thread_id()].cipher_ctx;
> EVP_EncryptInit_ex(ctx, NULL, NULL, NULL, iv_ptr);
>
> ret = internal_encrypt(ctx, pkt, param);
>
> What you propose, would be in between. Maybe I should try and
> see how fast it would be.

I did some interim version (see https://github.com/Linaro/odp/pull/342).
Speedup is not so large. I will try hacking support for per-session
CTX + local CTX, but this
might take some time.

>
> Janne
>
>
>> -Original Message-
>> From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of 
>> Savolainen, Petri
>> (Nokia - FI/Espoo)
>> Sent: Tuesday, December 12, 2017 10:30 AM
>> To: Bill Fischofer ; Dmitry Eremin-Solenikov
>> 
>> Cc: lng-odp@lists.linaro.org
>> Subject: Suspected SPAM - Re: [lng-odp] IPsec and crypto performance and 
>> OpenSSL
>>
>> We should not deliberately break process support. Since ODP and OFP are 
>> libraries, it's
>> the application (e.g. NGINX) that creates the threads and it may have 
>> historical or other
>> valid reasons to fork processes instead of creating pthreads. Processes may 
>> be forked at
>> many points. If we target fork after global init, we are making process 
>> support better
>> instead of making it worse.
>>
>> So, I'd suggest to preallocate Open SSL (per thread) context memory in 
>> global_init(). I
>> guess context allocation depends on algorithm, etc config, but we could e.g. 
>> pre-allocate
>> the most obvious ones (e.g. AES+SHA1, or AES-GCM) and leave all others as 
>> they are today.
>> A balance between simple solution, process support and better performance 
>> for the common
>> case.
>>
>> -Petri
>>
>>
>>
>> > -Original Message-
>> > From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of Bill
>> > Fischofer
>> > Sent: Tuesday, December 12, 2017 3:23 AM
>> > To: Dmitry Eremin-Solenikov 
>> > Cc: lng-odp@lists.linaro.org
>> > Subject: Re: [lng-odp] IPsec and crypto performance and OpenSSL
>> >
>> > I think we've pretty much abandoned the notion that linux-generic will
>> > support threads as separate processes as there seems to be little
>> > justification for it. The current plan is to continue this assumption in
>> > the "2.0" code base as well. So having OpenSSL rely on threads sharing the
>> > same address space should not be a problem in practice.
>> >
>> > Any platform that has native HW crypto capabilities would certainly use
>> > those in preference OpenSSL, so again such restrictions should not be of
>> > concern here.
>> >
>> > I agree, however, that this sort of tuning should be a follow-on activity,
>> > perhaps under the general performance improvement work we want for 2.0,
>> > but
>> > should not be a gating consideration for the Tiger Moth release.
>> >
>> > On Mon, Dec 11, 2017 at 5:19 PM, Dmitry Eremin-Solenikov <
>> > dmitry.ereminsoleni...@linaro.org> wrote:
>> >
>> > > On 11 December 2017 at 19:14, Maxim Uvarov 
>> > > wrote:
>> > > > odp_init_global() allocates shm, then odp_init_local() /
>> > odp_term_local()
>> > > > allocates/destroys per thread contexts in array in that shm. I think
>> > that
>> > > > has to work.
>> > >
>> > > The problem lies in OpenSSL 1.1 "opaque structures" approach. They
>> > stopped
>> > > providing exact struct definitions which can be embedded somewhere.
>> > Another
>> > > option would be to switch to libnettle licensed under LGPL-2.1+. It may
>> > be
>> > > however less optimized compared to OpenSSL.
>> > >
>> > > > On 11 December 2017 at 17:02, Francois Ozog 
>> > > > wrote:
>> > > >
>> > > >> I favor finishing ODP (ex 2.0) integration rather than optimizing
>> > > >> linux-generic at this stage.
>> > > >>
>> > > >> On 11 December 2017 at 14:39, Peltonen, Janne (Nokia - FI/Espoo) <
>> > > >> janne.pelto...@nokia.com> wrote:
>> > > >>
>> > > >> > Hi,
>> > > >> >
>> > > >> 

Re: [lng-odp] Suspected SPAM - Re: IPsec and crypto performance and OpenSSL

2017-12-12 Thread Dmitry Eremin-Solenikov
Hello,

On 12 December 2017 at 12:38, Peltonen, Janne (Nokia - FI/Espoo)
 wrote:
>
>> So, I'd suggest to preallocate Open SSL (per thread) context memory in 
>> global_init(). I
>> guess context allocation depends on algorithm, etc config, but we could e.g. 
>> pre-allocate
>> the most obvious ones (e.g. AES+SHA1, or AES-GCM) and leave all others as 
>> they are today.
>> A balance between simple solution, process support and better performance 
>> for the common
>> case.
>
> That would help but since the same context would be shared by
> many crypto sessions the keys would have to be configured in the
> context in every crypto op.
>
> The current code looks like this (in every crypto op):
>
> ctx = EVP_CIPHER_CTX_new();
> EVP_EncryptInit_ex(ctx, session->cipher.evp_cipher, NULL,
>session->cipher.key_data, NULL);
> EVP_EncryptInit_ex(ctx, NULL, NULL, NULL, iv_ptr);
> EVP_CIPHER_CTX_set_padding(ctx, 0);
>
> ret = internal_encrypt(ctx, pkt, param);
>
> EVP_CIPHER_CTX_free(ctx);
>
> I tried this, with big speedup (and broken process support):
>
> ctx = session->perthread[odp_thread_id()].cipher_ctx;
> EVP_EncryptInit_ex(ctx, NULL, NULL, NULL, iv_ptr);
>
> ret = internal_encrypt(ctx, pkt, param);

Also note that this will break explicit IV support.

> What you propose, would be in between. Maybe I should try and
> see how fast it would be.
>
> Janne
>
>
>> -Original Message-
>> From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of 
>> Savolainen, Petri
>> (Nokia - FI/Espoo)
>> Sent: Tuesday, December 12, 2017 10:30 AM
>> To: Bill Fischofer ; Dmitry Eremin-Solenikov
>> 
>> Cc: lng-odp@lists.linaro.org
>> Subject: Suspected SPAM - Re: [lng-odp] IPsec and crypto performance and 
>> OpenSSL
>>
>> We should not deliberately break process support. Since ODP and OFP are 
>> libraries, it's
>> the application (e.g. NGINX) that creates the threads and it may have 
>> historical or other
>> valid reasons to fork processes instead of creating pthreads. Processes may 
>> be forked at
>> many points. If we target fork after global init, we are making process 
>> support better
>> instead of making it worse.
>>
>> So, I'd suggest to preallocate Open SSL (per thread) context memory in 
>> global_init(). I
>> guess context allocation depends on algorithm, etc config, but we could e.g. 
>> pre-allocate
>> the most obvious ones (e.g. AES+SHA1, or AES-GCM) and leave all others as 
>> they are today.
>> A balance between simple solution, process support and better performance 
>> for the common
>> case.
>>
>> -Petri
>>
>>
>>
>> > -Original Message-
>> > From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of Bill
>> > Fischofer
>> > Sent: Tuesday, December 12, 2017 3:23 AM
>> > To: Dmitry Eremin-Solenikov 
>> > Cc: lng-odp@lists.linaro.org
>> > Subject: Re: [lng-odp] IPsec and crypto performance and OpenSSL
>> >
>> > I think we've pretty much abandoned the notion that linux-generic will
>> > support threads as separate processes as there seems to be little
>> > justification for it. The current plan is to continue this assumption in
>> > the "2.0" code base as well. So having OpenSSL rely on threads sharing the
>> > same address space should not be a problem in practice.
>> >
>> > Any platform that has native HW crypto capabilities would certainly use
>> > those in preference OpenSSL, so again such restrictions should not be of
>> > concern here.
>> >
>> > I agree, however, that this sort of tuning should be a follow-on activity,
>> > perhaps under the general performance improvement work we want for 2.0,
>> > but
>> > should not be a gating consideration for the Tiger Moth release.
>> >
>> > On Mon, Dec 11, 2017 at 5:19 PM, Dmitry Eremin-Solenikov <
>> > dmitry.ereminsoleni...@linaro.org> wrote:
>> >
>> > > On 11 December 2017 at 19:14, Maxim Uvarov 
>> > > wrote:
>> > > > odp_init_global() allocates shm, then odp_init_local() /
>> > odp_term_local()
>> > > > allocates/destroys per thread contexts in array in that shm. I think
>> > that
>> > > > has to work.
>> > >
>> > > The problem lies in OpenSSL 1.1 "opaque structures" approach. They
>> > stopped
>> > > providing exact struct definitions which can be embedded somewhere.
>> > Another
>> > > option would be to switch to libnettle licensed under LGPL-2.1+. It may
>> > be
>> > > however less optimized compared to OpenSSL.
>> > >
>> > > > On 11 December 2017 at 17:02, Francois Ozog 
>> > > > wrote:
>> > > >
>> > > >> I favor finishing ODP (ex 2.0) integration rather than optimizing
>> > > >> linux-generic at this stage.
>> > > >>
>> > > >> On 11 December 2017 at 14:39, Peltonen, Janne (Nokia - FI/Espoo) <
>> > > >> janne.pelto...@nokia.com> wrote:
>> > > >>
>> > > >> > Hi,
>> > > >> >
>> > > >> > When playing with IPsec I noticed that the Linux generic
>> > > >> > ODP implementation creates a separate OpenSSL crypto context
>> > >

Re: [lng-odp] Suspected SPAM - Re: IPsec and crypto performance and OpenSSL

2017-12-12 Thread Peltonen, Janne (Nokia - FI/Espoo)

> So, I'd suggest to preallocate Open SSL (per thread) context memory in 
> global_init(). I
> guess context allocation depends on algorithm, etc config, but we could e.g. 
> pre-allocate
> the most obvious ones (e.g. AES+SHA1, or AES-GCM) and leave all others as 
> they are today.
> A balance between simple solution, process support and better performance for 
> the common
> case.

That would help but since the same context would be shared by
many crypto sessions the keys would have to be configured in the
context in every crypto op.

The current code looks like this (in every crypto op):

ctx = EVP_CIPHER_CTX_new();
EVP_EncryptInit_ex(ctx, session->cipher.evp_cipher, NULL,
   session->cipher.key_data, NULL);
EVP_EncryptInit_ex(ctx, NULL, NULL, NULL, iv_ptr);
EVP_CIPHER_CTX_set_padding(ctx, 0);

ret = internal_encrypt(ctx, pkt, param);

EVP_CIPHER_CTX_free(ctx);

I tried this, with big speedup (and broken process support):

ctx = session->perthread[odp_thread_id()].cipher_ctx;
EVP_EncryptInit_ex(ctx, NULL, NULL, NULL, iv_ptr);

ret = internal_encrypt(ctx, pkt, param);

What you propose, would be in between. Maybe I should try and
see how fast it would be.

Janne


> -Original Message-
> From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of 
> Savolainen, Petri
> (Nokia - FI/Espoo)
> Sent: Tuesday, December 12, 2017 10:30 AM
> To: Bill Fischofer ; Dmitry Eremin-Solenikov
> 
> Cc: lng-odp@lists.linaro.org
> Subject: Suspected SPAM - Re: [lng-odp] IPsec and crypto performance and 
> OpenSSL
> 
> We should not deliberately break process support. Since ODP and OFP are 
> libraries, it's
> the application (e.g. NGINX) that creates the threads and it may have 
> historical or other
> valid reasons to fork processes instead of creating pthreads. Processes may 
> be forked at
> many points. If we target fork after global init, we are making process 
> support better
> instead of making it worse.
> 
> So, I'd suggest to preallocate Open SSL (per thread) context memory in 
> global_init(). I
> guess context allocation depends on algorithm, etc config, but we could e.g. 
> pre-allocate
> the most obvious ones (e.g. AES+SHA1, or AES-GCM) and leave all others as 
> they are today.
> A balance between simple solution, process support and better performance for 
> the common
> case.
> 
> -Petri
> 
> 
> 
> > -Original Message-
> > From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of Bill
> > Fischofer
> > Sent: Tuesday, December 12, 2017 3:23 AM
> > To: Dmitry Eremin-Solenikov 
> > Cc: lng-odp@lists.linaro.org
> > Subject: Re: [lng-odp] IPsec and crypto performance and OpenSSL
> >
> > I think we've pretty much abandoned the notion that linux-generic will
> > support threads as separate processes as there seems to be little
> > justification for it. The current plan is to continue this assumption in
> > the "2.0" code base as well. So having OpenSSL rely on threads sharing the
> > same address space should not be a problem in practice.
> >
> > Any platform that has native HW crypto capabilities would certainly use
> > those in preference OpenSSL, so again such restrictions should not be of
> > concern here.
> >
> > I agree, however, that this sort of tuning should be a follow-on activity,
> > perhaps under the general performance improvement work we want for 2.0,
> > but
> > should not be a gating consideration for the Tiger Moth release.
> >
> > On Mon, Dec 11, 2017 at 5:19 PM, Dmitry Eremin-Solenikov <
> > dmitry.ereminsoleni...@linaro.org> wrote:
> >
> > > On 11 December 2017 at 19:14, Maxim Uvarov 
> > > wrote:
> > > > odp_init_global() allocates shm, then odp_init_local() /
> > odp_term_local()
> > > > allocates/destroys per thread contexts in array in that shm. I think
> > that
> > > > has to work.
> > >
> > > The problem lies in OpenSSL 1.1 "opaque structures" approach. They
> > stopped
> > > providing exact struct definitions which can be embedded somewhere.
> > Another
> > > option would be to switch to libnettle licensed under LGPL-2.1+. It may
> > be
> > > however less optimized compared to OpenSSL.
> > >
> > > > On 11 December 2017 at 17:02, Francois Ozog 
> > > > wrote:
> > > >
> > > >> I favor finishing ODP (ex 2.0) integration rather than optimizing
> > > >> linux-generic at this stage.
> > > >>
> > > >> On 11 December 2017 at 14:39, Peltonen, Janne (Nokia - FI/Espoo) <
> > > >> janne.pelto...@nokia.com> wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > When playing with IPsec I noticed that the Linux generic
> > > >> > ODP implementation creates a separate OpenSSL crypto context
> > > >> > for each crypto-operation as opposed to doing it at ODP
> > > >> > crypto session creation. With IPsec this adds a lot of
> > > >> > overhead for every packet processed and significantly
> > > >> > reduces packet throughput.
> > > >> >
> > > >> > I wonder what, if anyth