Hello, On 12 December 2017 at 12:38, Peltonen, Janne (Nokia - FI/Espoo) <janne.pelto...@nokia.com> wrote: > >> So, I'd suggest to preallocate Open SSL (per thread) context memory in >> global_init(). I >> guess context allocation depends on algorithm, etc config, but we could e.g. >> pre-allocate >> the most obvious ones (e.g. AES+SHA1, or AES-GCM) and leave all others as >> they are today. >> A balance between simple solution, process support and better performance >> for the common >> case. > > That would help but since the same context would be shared by > many crypto sessions the keys would have to be configured in the > context in every crypto op. > > The current code looks like this (in every crypto op): > > ctx = EVP_CIPHER_CTX_new(); > EVP_EncryptInit_ex(ctx, session->cipher.evp_cipher, NULL, > session->cipher.key_data, NULL); > EVP_EncryptInit_ex(ctx, NULL, NULL, NULL, iv_ptr); > EVP_CIPHER_CTX_set_padding(ctx, 0); > > ret = internal_encrypt(ctx, pkt, param); > > EVP_CIPHER_CTX_free(ctx); > > I tried this, with big speedup (and broken process support): > > ctx = session->perthread[odp_thread_id()].cipher_ctx; > EVP_EncryptInit_ex(ctx, NULL, NULL, NULL, iv_ptr); > > ret = internal_encrypt(ctx, pkt, param);
Also note that this will break explicit IV support. > What you propose, would be in between. Maybe I should try and > see how fast it would be. > > Janne > > >> -----Original Message----- >> From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of >> Savolainen, Petri >> (Nokia - FI/Espoo) >> Sent: Tuesday, December 12, 2017 10:30 AM >> To: Bill Fischofer <bill.fischo...@linaro.org>; Dmitry Eremin-Solenikov >> <dmitry.ereminsoleni...@linaro.org> >> Cc: lng-odp@lists.linaro.org >> Subject: Suspected SPAM - Re: [lng-odp] IPsec and crypto performance and >> OpenSSL >> >> We should not deliberately break process support. Since ODP and OFP are >> libraries, it's >> the application (e.g. NGINX) that creates the threads and it may have >> historical or other >> valid reasons to fork processes instead of creating pthreads. Processes may >> be forked at >> many points. If we target fork after global init, we are making process >> support better >> instead of making it worse. >> >> So, I'd suggest to preallocate Open SSL (per thread) context memory in >> global_init(). I >> guess context allocation depends on algorithm, etc config, but we could e.g. >> pre-allocate >> the most obvious ones (e.g. AES+SHA1, or AES-GCM) and leave all others as >> they are today. >> A balance between simple solution, process support and better performance >> for the common >> case. >> >> -Petri >> >> >> >> > -----Original Message----- >> > From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of Bill >> > Fischofer >> > Sent: Tuesday, December 12, 2017 3:23 AM >> > To: Dmitry Eremin-Solenikov <dmitry.ereminsoleni...@linaro.org> >> > Cc: lng-odp@lists.linaro.org >> > Subject: Re: [lng-odp] IPsec and crypto performance and OpenSSL >> > >> > I think we've pretty much abandoned the notion that linux-generic will >> > support threads as separate processes as there seems to be little >> > justification for it. The current plan is to continue this assumption in >> > the "2.0" code base as well. So having OpenSSL rely on threads sharing the >> > same address space should not be a problem in practice. >> > >> > Any platform that has native HW crypto capabilities would certainly use >> > those in preference OpenSSL, so again such restrictions should not be of >> > concern here. >> > >> > I agree, however, that this sort of tuning should be a follow-on activity, >> > perhaps under the general performance improvement work we want for 2.0, >> > but >> > should not be a gating consideration for the Tiger Moth release. >> > >> > On Mon, Dec 11, 2017 at 5:19 PM, Dmitry Eremin-Solenikov < >> > dmitry.ereminsoleni...@linaro.org> wrote: >> > >> > > On 11 December 2017 at 19:14, Maxim Uvarov <maxim.uva...@linaro.org> >> > > wrote: >> > > > odp_init_global() allocates shm, then odp_init_local() / >> > odp_term_local() >> > > > allocates/destroys per thread contexts in array in that shm. I think >> > that >> > > > has to work. >> > > >> > > The problem lies in OpenSSL 1.1 "opaque structures" approach. They >> > stopped >> > > providing exact struct definitions which can be embedded somewhere. >> > Another >> > > option would be to switch to libnettle licensed under LGPL-2.1+. It may >> > be >> > > however less optimized compared to OpenSSL. >> > > >> > > > On 11 December 2017 at 17:02, Francois Ozog <francois.o...@linaro.org> >> > > > wrote: >> > > > >> > > >> I favor finishing ODP (ex 2.0) integration rather than optimizing >> > > >> linux-generic at this stage. >> > > >> >> > > >> On 11 December 2017 at 14:39, Peltonen, Janne (Nokia - FI/Espoo) < >> > > >> janne.pelto...@nokia.com> wrote: >> > > >> >> > > >> > Hi, >> > > >> > >> > > >> > When playing with IPsec I noticed that the Linux generic >> > > >> > ODP implementation creates a separate OpenSSL crypto context >> > > >> > for each crypto-operation as opposed to doing it at ODP >> > > >> > crypto session creation. With IPsec this adds a lot of >> > > >> > overhead for every packet processed and significantly >> > > >> > reduces packet throughput. >> > > >> > >> > > >> > I wonder what, if anything, should be done about it. >> > > >> > >> > > >> > I already almost sent a patch to create and initialize >> > > >> > crypto contexts only once per session but realized that >> > > >> > it is not that easy. >> > > >> > >> > > >> > Here are some alternatives that came to my mind, but all >> > > >> > of them have their own problems: >> > > >> > >> > > >> > a) Create per-thread OpenSSL contexts at crypto session >> > > >> > creation time. >> > > >> > - Does not work with ODP threads that do not share >> > > >> > their address space since OpenSSL is allocating >> > > >> > memory through malloc() during context creation. >> > > >> > >> > > >> > b) Do a) plus provide OpenSSL a custom memory allocator >> > > >> > on top of shared memory. >> > > >> > - There is no generic heap allocator in ODP code base. >> > > >> > >> > > >> > c) Create per-thread contexts lazily when needed. >> > > >> > - Creation would work as it would happen in the right >> > > >> > thread but there would be no way to delete the >> > > >> > contexts. The thread destroying the ODP crypto >> > > >> > session cannot delete the per-thread contexts that >> > > >> > might reside in a different address spaces. That >> > > >> > thread could ask every other thread to do the >> > > >> > per-thread cleanup, except that there is no mechanism >> > > >> > for that without application assistance or big >> > > >> > changes in the generic ODP code. >> > > >> > >> > > >> > d) Create a limited-size cache of per-thread contexts. >> > > >> > - This would allow postponing the deletion of each >> > > >> > context either to the point the cache slot needs >> > > >> > to be reused or all the way to ODP termination, >> > > >> > both occuring in the right thread. But this is >> > > >> > getting complicated and sizing the cache is nasty. >> > > >> > >> > > >> > Any thoughts? >> > > >> > >> > > >> > Janne >> > > >> > >> > > >> > >> > > >> > >> > > >> >> > > >> >> > > >> -- >> > > >> [image: Linaro] <http://www.linaro.org/> >> > > >> François-Frédéric Ozog | *Director Linaro Networking Group* >> > > >> T: +33.67221.6485 >> > > >> francois.o...@linaro.org | Skype: ffozog >> > > >> >> > > >> > > >> > > >> > > -- >> > > With best wishes >> > > Dmitry >> > > -- With best wishes Dmitry