Hello,

On 12 December 2017 at 12:38, Peltonen, Janne (Nokia - FI/Espoo)
<janne.pelto...@nokia.com> wrote:
>
>> So, I'd suggest to preallocate Open SSL (per thread) context memory in 
>> global_init(). I
>> guess context allocation depends on algorithm, etc config, but we could e.g. 
>> pre-allocate
>> the most obvious ones (e.g. AES+SHA1, or AES-GCM) and leave all others as 
>> they are today.
>> A balance between simple solution, process support and better performance 
>> for the common
>> case.
>
> That would help but since the same context would be shared by
> many crypto sessions the keys would have to be configured in the
> context in every crypto op.
>
> The current code looks like this (in every crypto op):
>
>         ctx = EVP_CIPHER_CTX_new();
>         EVP_EncryptInit_ex(ctx, session->cipher.evp_cipher, NULL,
>                            session->cipher.key_data, NULL);
>         EVP_EncryptInit_ex(ctx, NULL, NULL, NULL, iv_ptr);
>         EVP_CIPHER_CTX_set_padding(ctx, 0);
>
>         ret = internal_encrypt(ctx, pkt, param);
>
>         EVP_CIPHER_CTX_free(ctx);
>
> I tried this, with big speedup (and broken process support):
>
>         ctx = session->perthread[odp_thread_id()].cipher_ctx;
>         EVP_EncryptInit_ex(ctx, NULL, NULL, NULL, iv_ptr);
>
>         ret = internal_encrypt(ctx, pkt, param);

Also note that this will break explicit IV support.

> What you propose, would be in between. Maybe I should try and
> see how fast it would be.
>
>         Janne
>
>
>> -----Original Message-----
>> From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of 
>> Savolainen, Petri
>> (Nokia - FI/Espoo)
>> Sent: Tuesday, December 12, 2017 10:30 AM
>> To: Bill Fischofer <bill.fischo...@linaro.org>; Dmitry Eremin-Solenikov
>> <dmitry.ereminsoleni...@linaro.org>
>> Cc: lng-odp@lists.linaro.org
>> Subject: Suspected SPAM - Re: [lng-odp] IPsec and crypto performance and 
>> OpenSSL
>>
>> We should not deliberately break process support. Since ODP and OFP are 
>> libraries, it's
>> the application (e.g. NGINX) that creates the threads and it may have 
>> historical or other
>> valid reasons to fork processes instead of creating pthreads. Processes may 
>> be forked at
>> many points. If we target fork after global init, we are making process 
>> support better
>> instead of making it worse.
>>
>> So, I'd suggest to preallocate Open SSL (per thread) context memory in 
>> global_init(). I
>> guess context allocation depends on algorithm, etc config, but we could e.g. 
>> pre-allocate
>> the most obvious ones (e.g. AES+SHA1, or AES-GCM) and leave all others as 
>> they are today.
>> A balance between simple solution, process support and better performance 
>> for the common
>> case.
>>
>> -Petri
>>
>>
>>
>> > -----Original Message-----
>> > From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of Bill
>> > Fischofer
>> > Sent: Tuesday, December 12, 2017 3:23 AM
>> > To: Dmitry Eremin-Solenikov <dmitry.ereminsoleni...@linaro.org>
>> > Cc: lng-odp@lists.linaro.org
>> > Subject: Re: [lng-odp] IPsec and crypto performance and OpenSSL
>> >
>> > I think we've pretty much abandoned the notion that linux-generic will
>> > support threads as separate processes as there seems to be little
>> > justification for it. The current plan is to continue this assumption in
>> > the "2.0" code base as well. So having OpenSSL rely on threads sharing the
>> > same address space should not be a problem in practice.
>> >
>> > Any platform that has native HW crypto capabilities would certainly use
>> > those in preference OpenSSL, so again such restrictions should not be of
>> > concern here.
>> >
>> > I agree, however, that this sort of tuning should be a follow-on activity,
>> > perhaps under the general performance improvement work we want for 2.0,
>> > but
>> > should not be a gating consideration for the Tiger Moth release.
>> >
>> > On Mon, Dec 11, 2017 at 5:19 PM, Dmitry Eremin-Solenikov <
>> > dmitry.ereminsoleni...@linaro.org> wrote:
>> >
>> > > On 11 December 2017 at 19:14, Maxim Uvarov <maxim.uva...@linaro.org>
>> > > wrote:
>> > > > odp_init_global() allocates shm, then odp_init_local() /
>> > odp_term_local()
>> > > > allocates/destroys per thread contexts in array in that shm. I think
>> > that
>> > > > has to work.
>> > >
>> > > The problem lies in OpenSSL 1.1 "opaque structures" approach. They
>> > stopped
>> > > providing exact struct definitions which can be embedded somewhere.
>> > Another
>> > > option would be to switch to libnettle licensed under LGPL-2.1+. It may
>> > be
>> > > however less optimized compared to OpenSSL.
>> > >
>> > > > On 11 December 2017 at 17:02, Francois Ozog <francois.o...@linaro.org>
>> > > > wrote:
>> > > >
>> > > >> I favor finishing ODP (ex 2.0) integration rather than optimizing
>> > > >> linux-generic at this stage.
>> > > >>
>> > > >> On 11 December 2017 at 14:39, Peltonen, Janne (Nokia - FI/Espoo) <
>> > > >> janne.pelto...@nokia.com> wrote:
>> > > >>
>> > > >> > Hi,
>> > > >> >
>> > > >> > When playing with IPsec I noticed that the Linux generic
>> > > >> > ODP implementation creates a separate OpenSSL crypto context
>> > > >> > for each crypto-operation as opposed to doing it at ODP
>> > > >> > crypto session creation. With IPsec this adds a lot of
>> > > >> > overhead for every packet processed and significantly
>> > > >> > reduces packet throughput.
>> > > >> >
>> > > >> > I wonder what, if anything, should be done about it.
>> > > >> >
>> > > >> > I already almost sent a patch to create and initialize
>> > > >> > crypto contexts only once per session but realized that
>> > > >> > it is not that easy.
>> > > >> >
>> > > >> > Here are some alternatives that came to my mind, but all
>> > > >> > of them have their own problems:
>> > > >> >
>> > > >> > a) Create per-thread OpenSSL contexts at crypto session
>> > > >> >    creation time.
>> > > >> >    - Does not work with ODP threads that do not share
>> > > >> >      their address space since OpenSSL is allocating
>> > > >> >      memory through malloc() during context creation.
>> > > >> >
>> > > >> > b) Do a) plus provide OpenSSL a custom memory allocator
>> > > >> >    on top of shared memory.
>> > > >> >    - There is no generic heap allocator in ODP code base.
>> > > >> >
>> > > >> > c) Create per-thread contexts lazily when needed.
>> > > >> >    - Creation would work as it would happen in the right
>> > > >> >      thread but there would be no way to delete the
>> > > >> >      contexts. The thread destroying the ODP crypto
>> > > >> >      session cannot delete the per-thread contexts that
>> > > >> >      might reside in a different address spaces. That
>> > > >> >      thread could ask every other thread to do the
>> > > >> >      per-thread cleanup, except that there is no mechanism
>> > > >> >      for that without application assistance or big
>> > > >> >      changes in the generic ODP code.
>> > > >> >
>> > > >> > d) Create a limited-size cache of per-thread contexts.
>> > > >> >    - This would allow postponing the deletion of each
>> > > >> >      context either to the point the cache slot needs
>> > > >> >      to be reused or all the way to ODP termination,
>> > > >> >      both occuring in the right thread. But this is
>> > > >> >      getting complicated and sizing the cache is nasty.
>> > > >> >
>> > > >> > Any thoughts?
>> > > >> >
>> > > >> >         Janne
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >>
>> > > >>
>> > > >> --
>> > > >> [image: Linaro] <http://www.linaro.org/>
>> > > >> François-Frédéric Ozog | *Director Linaro Networking Group*
>> > > >> T: +33.67221.6485
>> > > >> francois.o...@linaro.org | Skype: ffozog
>> > > >>
>> > >
>> > >
>> > >
>> > > --
>> > > With best wishes
>> > > Dmitry
>> > >



-- 
With best wishes
Dmitry

Reply via email to