Volker Armin Hemmann <volkerarmin <at> googlemail.com> writes:

> >>>>>> http://www.agner.org/optimize/calling_conventions.pdf
> >>>>>
> >>>>> Not sure what you're trying to say.
> >>>>>
> >>>>
> >>>> that simd is not save in kernel if not carefully guarded.
> >>>>
> >>>> Really people, just don't fuck around with the cflags.
> >>>
> >>> I still fail to see the relevance. Unless you mean using a different
> >>> -O level. In that case, yes. You shouldn't. But I was talking about
> >>> -march.
> >>>
> >>
> >> you said this
> >>
> >>>
> >>> (note that SIMD is not FP and is perfectly fine in the kernel.)
> >>
> >> and I have shown you that you are wrong.
> >
> > Not sure why you think that. The kernel crypto routines are full of
> > SIMD code (like SSE and AVX.) Automatic vectorization wouldn't work.
> > But -march is not going to introduce that
> 
> and never used in interrupt context and carefully guarded. You act like
> 'oh, you can use simd instructions without any consideration' and that
> is just not true.


Volker,
Historically, you are correct. Looking forward, GCC-5.x will (can?) change
this as the simd and other hardware, including (DDR_5) memory all become
available  for (compiler) usage. For the longest time, we the FOSS
communities, have at best been given access to low lever APIs for access to
some of these hardware resources. All processor architectures are at war.
Intel (the bastards) have had FPGA and tools to reconfigure the amount and
types of hardwware in some of their  processors for quite some time. 

The Arm64 cores have simd (GPU if you like) centric cores on the same SOC as
the arm64 bit licensed CPU cores. The new gpu has already been integrated
into the processor cores (same substrate) just the the i387 FPU was some
decades ago. So Arm is providing 'bare metal' access to various customers
and compilers Since there are thousands of vendors building up customer
arm64 SOCs there is no way for Arm to constrict, like Intel, Nvidia and AMD
have historically done. Game_set_match.  

Even though those GPU cores available via arm64 are very weak compared to
Nvidia and AMD; bare metal access to those (gpu) resources if far superior
to what Intel (dragging their feet), Nvidia or AMD are offering. Just look
at how AMD's Mantle has stalled for the FOSS communities. Amd, via
competition from  a myriad of arm SOC vendors, is being forced to roll out
Arm64 bit server chips, just to stay relevant. Both of you guys are looking
at this issue, from historically color-coded sunglasses. Change is here; get
onboard with helping the masses help themselves to the feeding (coding) freenzy.


What a pair of really smart guys like you (2) should be doing is setting up
a gentoo wiki listing and demonstrating for others how to "profile" low
level codes: both kernel and system level, so these other gentoo folks *can
learn* about what you are saying  by example; running tools such as
kernelshark, and other performance/profiling types of analysis. Providing
seemless  and generic access to the gpu resources will go a long way towards
revitalizing FOSS cryptographic dominance; and that is a very good thing. ymmv.


For the record,  most simd hardware really sucks for dense_matrix
requirements. Most simd hardware only really works for sparse matrix
apps, like x.264 because the overlying (embedded) algorithms used are poorly
documented by intention from the hardware vendors. I do not have direct
proof; but I strongly suspect this is the case because the simd pipelined
memory  that these low level APIs give to FOSS community, are memory
constricted by design.


peace,
James





Reply via email to