Volker Armin Hemmann <volkerarmin <at> googlemail.com> writes:
> >>>>>> http://www.agner.org/optimize/calling_conventions.pdf > >>>>> > >>>>> Not sure what you're trying to say. > >>>>> > >>>> > >>>> that simd is not save in kernel if not carefully guarded. > >>>> > >>>> Really people, just don't fuck around with the cflags. > >>> > >>> I still fail to see the relevance. Unless you mean using a different > >>> -O level. In that case, yes. You shouldn't. But I was talking about > >>> -march. > >>> > >> > >> you said this > >> > >>> > >>> (note that SIMD is not FP and is perfectly fine in the kernel.) > >> > >> and I have shown you that you are wrong. > > > > Not sure why you think that. The kernel crypto routines are full of > > SIMD code (like SSE and AVX.) Automatic vectorization wouldn't work. > > But -march is not going to introduce that > > and never used in interrupt context and carefully guarded. You act like > 'oh, you can use simd instructions without any consideration' and that > is just not true. Volker, Historically, you are correct. Looking forward, GCC-5.x will (can?) change this as the simd and other hardware, including (DDR_5) memory all become available for (compiler) usage. For the longest time, we the FOSS communities, have at best been given access to low lever APIs for access to some of these hardware resources. All processor architectures are at war. Intel (the bastards) have had FPGA and tools to reconfigure the amount and types of hardwware in some of their processors for quite some time. The Arm64 cores have simd (GPU if you like) centric cores on the same SOC as the arm64 bit licensed CPU cores. The new gpu has already been integrated into the processor cores (same substrate) just the the i387 FPU was some decades ago. So Arm is providing 'bare metal' access to various customers and compilers Since there are thousands of vendors building up customer arm64 SOCs there is no way for Arm to constrict, like Intel, Nvidia and AMD have historically done. Game_set_match. Even though those GPU cores available via arm64 are very weak compared to Nvidia and AMD; bare metal access to those (gpu) resources if far superior to what Intel (dragging their feet), Nvidia or AMD are offering. Just look at how AMD's Mantle has stalled for the FOSS communities. Amd, via competition from a myriad of arm SOC vendors, is being forced to roll out Arm64 bit server chips, just to stay relevant. Both of you guys are looking at this issue, from historically color-coded sunglasses. Change is here; get onboard with helping the masses help themselves to the feeding (coding) freenzy. What a pair of really smart guys like you (2) should be doing is setting up a gentoo wiki listing and demonstrating for others how to "profile" low level codes: both kernel and system level, so these other gentoo folks *can learn* about what you are saying by example; running tools such as kernelshark, and other performance/profiling types of analysis. Providing seemless and generic access to the gpu resources will go a long way towards revitalizing FOSS cryptographic dominance; and that is a very good thing. ymmv. For the record, most simd hardware really sucks for dense_matrix requirements. Most simd hardware only really works for sparse matrix apps, like x.264 because the overlying (embedded) algorithms used are poorly documented by intention from the hardware vendors. I do not have direct proof; but I strongly suspect this is the case because the simd pipelined memory that these low level APIs give to FOSS community, are memory constricted by design. peace, James