Re: [gentoo-user] Re: CFLAGs for kernel compilation
Am 02.05.2015 um 07:04 schrieb Nikos Chantziaras: On 01/05/15 10:44, Andrew Savchenko wrote: On Fri, 1 May 2015 05:09:51 + (UTC) Martin Vaeth wrote: Andrew Savchenko birc...@gentoo.org wrote: That's why kernel makes sure that no floating point instructions sneaks in using CFLAGS, you may see a lot of -mno-${intrucion_set} flags when running make -V. So it should be sufficient that the kernel does not use float or double, shouldn't it? No. Optimizer paths may be very unobvious, i.e. I'll not be surprised if under some conditions vectorizer may use float instructions for int code. The kernel uses -O2 and several -march variants (e.g. -march=core2). Several other options are used to prevent GCC from generating unsuitable code. Specifying another -march variant does not affect the optimizer though. It only affects the code generator. If you don't modify the other CFLAGS and only change -march, you will not get FP instructions unless you use FP in the code. Also, I'd be very interested to see *any* optimization that would somehow transform integer code to FP code (note that SIMD is not FP and is perfectly fine in the kernel.) In fact, optimizers tend to transform FP into SIMD, at least on x86 (and other architectures that have fast SIMD instructions.) If I inspect the generated assembly from GCC or Clang, I cannot find FP anywhere, even for code using float and double operations. They get converted to SIMD on modern CPUs (unless you specify a compiler flag that tells it to use the FPU, for example if you need 80-bit extended precision, which is supported by the x86 FPU.) http://www.agner.org/optimize/calling_conventions.pdf Device drivers under Linux Linux systems use lazy saving of floating point registers and vector registers. This means that these registers are not saved and restored on every task switch. Instead they are saved/restored on the first access after a task switch. This method saves time in case no more than one thread uses these registers. The lazy saving scheme is not supported in kernel mode. Any device driver that attempts to use these registers improperly will cause an exception that will probably make the system crash. A device driver that needs to use vector registers must first save these registers by calling the function kernel_fpu_begin() and restore the registers by calling kernel_fpu_end() before returning or sleeping. These functions also prevent pre-emptive interruption of the device driver which could otherwise mess up the registers. kernel_fpu_begin() saves all floating point registers and vector registers if available. There is no red zone in 64-bit Linux kernel mode. The programmer should be aware of these restrictions if calling any other library than the system kernel libraries from a device driver.
[gentoo-user] Re: CFLAGs for kernel compilation
On 02/05/15 14:19, Volker Armin Hemmann wrote: Am 02.05.2015 um 07:04 schrieb Nikos Chantziaras: On 01/05/15 10:44, Andrew Savchenko wrote: On Fri, 1 May 2015 05:09:51 + (UTC) Martin Vaeth wrote: Andrew Savchenko birc...@gentoo.org wrote: That's why kernel makes sure that no floating point instructions sneaks in using CFLAGS, you may see a lot of -mno-${intrucion_set} flags when running make -V. So it should be sufficient that the kernel does not use float or double, shouldn't it? No. Optimizer paths may be very unobvious, i.e. I'll not be surprised if under some conditions vectorizer may use float instructions for int code. The kernel uses -O2 and several -march variants (e.g. -march=core2). Several other options are used to prevent GCC from generating unsuitable code. Specifying another -march variant does not affect the optimizer though. It only affects the code generator. If you don't modify the other CFLAGS and only change -march, you will not get FP instructions unless you use FP in the code. Also, I'd be very interested to see *any* optimization that would somehow transform integer code to FP code (note that SIMD is not FP and is perfectly fine in the kernel.) In fact, optimizers tend to transform FP into SIMD, at least on x86 (and other architectures that have fast SIMD instructions.) If I inspect the generated assembly from GCC or Clang, I cannot find FP anywhere, even for code using float and double operations. They get converted to SIMD on modern CPUs (unless you specify a compiler flag that tells it to use the FPU, for example if you need 80-bit extended precision, which is supported by the x86 FPU.) http://www.agner.org/optimize/calling_conventions.pdf Not sure what you're trying to say.
Re: [gentoo-user] Re: CFLAGs for kernel compilation
Am 02.05.2015 um 14:06 schrieb Nikos Chantziaras: On 02/05/15 14:37, Volker Armin Hemmann wrote: Am 02.05.2015 um 13:25 schrieb Nikos Chantziaras: The kernel uses -O2 and several -march variants (e.g. -march=core2). Several other options are used to prevent GCC from generating unsuitable code. Specifying another -march variant does not affect the optimizer though. It only affects the code generator. If you don't modify the other CFLAGS and only change -march, you will not get FP instructions unless you use FP in the code. http://www.agner.org/optimize/calling_conventions.pdf Not sure what you're trying to say. that simd is not save in kernel if not carefully guarded. Really people, just don't fuck around with the cflags. I still fail to see the relevance. Unless you mean using a different -O level. In that case, yes. You shouldn't. But I was talking about -march. you said this (note that SIMD is not FP and is perfectly fine in the kernel.) and I have shown you that you are wrong.
[gentoo-user] Re: CFLAGs for kernel compilation
On 02/05/15 15:10, Volker Armin Hemmann wrote: Am 02.05.2015 um 14:06 schrieb Nikos Chantziaras: On 02/05/15 14:37, Volker Armin Hemmann wrote: Am 02.05.2015 um 13:25 schrieb Nikos Chantziaras: The kernel uses -O2 and several -march variants (e.g. -march=core2). Several other options are used to prevent GCC from generating unsuitable code. Specifying another -march variant does not affect the optimizer though. It only affects the code generator. If you don't modify the other CFLAGS and only change -march, you will not get FP instructions unless you use FP in the code. http://www.agner.org/optimize/calling_conventions.pdf Not sure what you're trying to say. that simd is not save in kernel if not carefully guarded. Really people, just don't fuck around with the cflags. I still fail to see the relevance. Unless you mean using a different -O level. In that case, yes. You shouldn't. But I was talking about -march. you said this (note that SIMD is not FP and is perfectly fine in the kernel.) and I have shown you that you are wrong. Not sure why you think that. The kernel crypto routines are full of SIMD code (like SSE and AVX.) Automatic vectorization wouldn't work. But -march is not going to introduce that.
[gentoo-user] Re: CFLAGs for kernel compilation
On 02/05/15 14:37, Volker Armin Hemmann wrote: Am 02.05.2015 um 13:25 schrieb Nikos Chantziaras: The kernel uses -O2 and several -march variants (e.g. -march=core2). Several other options are used to prevent GCC from generating unsuitable code. Specifying another -march variant does not affect the optimizer though. It only affects the code generator. If you don't modify the other CFLAGS and only change -march, you will not get FP instructions unless you use FP in the code. http://www.agner.org/optimize/calling_conventions.pdf Not sure what you're trying to say. that simd is not save in kernel if not carefully guarded. Really people, just don't fuck around with the cflags. I still fail to see the relevance. Unless you mean using a different -O level. In that case, yes. You shouldn't. But I was talking about -march.
Re: [gentoo-user] Re: CFLAGs for kernel compilation
Am 02.05.2015 um 13:25 schrieb Nikos Chantziaras: On 02/05/15 14:19, Volker Armin Hemmann wrote: Am 02.05.2015 um 07:04 schrieb Nikos Chantziaras: On 01/05/15 10:44, Andrew Savchenko wrote: On Fri, 1 May 2015 05:09:51 + (UTC) Martin Vaeth wrote: Andrew Savchenko birc...@gentoo.org wrote: That's why kernel makes sure that no floating point instructions sneaks in using CFLAGS, you may see a lot of -mno-${intrucion_set} flags when running make -V. So it should be sufficient that the kernel does not use float or double, shouldn't it? No. Optimizer paths may be very unobvious, i.e. I'll not be surprised if under some conditions vectorizer may use float instructions for int code. The kernel uses -O2 and several -march variants (e.g. -march=core2). Several other options are used to prevent GCC from generating unsuitable code. Specifying another -march variant does not affect the optimizer though. It only affects the code generator. If you don't modify the other CFLAGS and only change -march, you will not get FP instructions unless you use FP in the code. Also, I'd be very interested to see *any* optimization that would somehow transform integer code to FP code (note that SIMD is not FP and is perfectly fine in the kernel.) In fact, optimizers tend to transform FP into SIMD, at least on x86 (and other architectures that have fast SIMD instructions.) If I inspect the generated assembly from GCC or Clang, I cannot find FP anywhere, even for code using float and double operations. They get converted to SIMD on modern CPUs (unless you specify a compiler flag that tells it to use the FPU, for example if you need 80-bit extended precision, which is supported by the x86 FPU.) http://www.agner.org/optimize/calling_conventions.pdf Not sure what you're trying to say. that simd is not save in kernel if not carefully guarded. Really people, just don't fuck around with the cflags.
Re: [gentoo-user] Re: CFLAGs for kernel compilation
Am 02.05.2015 um 14:38 schrieb Nikos Chantziaras: On 02/05/15 15:10, Volker Armin Hemmann wrote: Am 02.05.2015 um 14:06 schrieb Nikos Chantziaras: On 02/05/15 14:37, Volker Armin Hemmann wrote: Am 02.05.2015 um 13:25 schrieb Nikos Chantziaras: The kernel uses -O2 and several -march variants (e.g. -march=core2). Several other options are used to prevent GCC from generating unsuitable code. Specifying another -march variant does not affect the optimizer though. It only affects the code generator. If you don't modify the other CFLAGS and only change -march, you will not get FP instructions unless you use FP in the code. http://www.agner.org/optimize/calling_conventions.pdf Not sure what you're trying to say. that simd is not save in kernel if not carefully guarded. Really people, just don't fuck around with the cflags. I still fail to see the relevance. Unless you mean using a different -O level. In that case, yes. You shouldn't. But I was talking about -march. you said this (note that SIMD is not FP and is perfectly fine in the kernel.) and I have shown you that you are wrong. Not sure why you think that. The kernel crypto routines are full of SIMD code (like SSE and AVX.) Automatic vectorization wouldn't work. But -march is not going to introduce that and never used in interrupt context and carefully guarded. You act like 'oh, you can use simd instructions without any consideration' and that is just not true.
[gentoo-user] Re: CFLAGs for kernel compilation
Volker Armin Hemmann volkerarmin at googlemail.com writes: http://www.agner.org/optimize/calling_conventions.pdf Not sure what you're trying to say. that simd is not save in kernel if not carefully guarded. Really people, just don't fuck around with the cflags. I still fail to see the relevance. Unless you mean using a different -O level. In that case, yes. You shouldn't. But I was talking about -march. you said this (note that SIMD is not FP and is perfectly fine in the kernel.) and I have shown you that you are wrong. Not sure why you think that. The kernel crypto routines are full of SIMD code (like SSE and AVX.) Automatic vectorization wouldn't work. But -march is not going to introduce that and never used in interrupt context and carefully guarded. You act like 'oh, you can use simd instructions without any consideration' and that is just not true. Volker, Historically, you are correct. Looking forward, GCC-5.x will (can?) change this as the simd and other hardware, including (DDR_5) memory all become available for (compiler) usage. For the longest time, we the FOSS communities, have at best been given access to low lever APIs for access to some of these hardware resources. All processor architectures are at war. Intel (the bastards) have had FPGA and tools to reconfigure the amount and types of hardwware in some of their processors for quite some time. The Arm64 cores have simd (GPU if you like) centric cores on the same SOC as the arm64 bit licensed CPU cores. The new gpu has already been integrated into the processor cores (same substrate) just the the i387 FPU was some decades ago. So Arm is providing 'bare metal' access to various customers and compilers Since there are thousands of vendors building up customer arm64 SOCs there is no way for Arm to constrict, like Intel, Nvidia and AMD have historically done. Game_set_match. Even though those GPU cores available via arm64 are very weak compared to Nvidia and AMD; bare metal access to those (gpu) resources if far superior to what Intel (dragging their feet), Nvidia or AMD are offering. Just look at how AMD's Mantle has stalled for the FOSS communities. Amd, via competition from a myriad of arm SOC vendors, is being forced to roll out Arm64 bit server chips, just to stay relevant. Both of you guys are looking at this issue, from historically color-coded sunglasses. Change is here; get onboard with helping the masses help themselves to the feeding (coding) freenzy. What a pair of really smart guys like you (2) should be doing is setting up a gentoo wiki listing and demonstrating for others how to profile low level codes: both kernel and system level, so these other gentoo folks *can learn* about what you are saying by example; running tools such as kernelshark, and other performance/profiling types of analysis. Providing seemless and generic access to the gpu resources will go a long way towards revitalizing FOSS cryptographic dominance; and that is a very good thing. ymmv. For the record, most simd hardware really sucks for dense_matrix requirements. Most simd hardware only really works for sparse matrix apps, like x.264 because the overlying (embedded) algorithms used are poorly documented by intention from the hardware vendors. I do not have direct proof; but I strongly suspect this is the case because the simd pipelined memory that these low level APIs give to FOSS community, are memory constricted by design. peace, James
[gentoo-user] Re: CFLAGs for kernel compilation
On 01/05/15 10:44, Andrew Savchenko wrote: On Fri, 1 May 2015 05:09:51 + (UTC) Martin Vaeth wrote: Andrew Savchenko birc...@gentoo.org wrote: That's why kernel makes sure that no floating point instructions sneaks in using CFLAGS, you may see a lot of -mno-${intrucion_set} flags when running make -V. So it should be sufficient that the kernel does not use float or double, shouldn't it? No. Optimizer paths may be very unobvious, i.e. I'll not be surprised if under some conditions vectorizer may use float instructions for int code. The kernel uses -O2 and several -march variants (e.g. -march=core2). Several other options are used to prevent GCC from generating unsuitable code. Specifying another -march variant does not affect the optimizer though. It only affects the code generator. If you don't modify the other CFLAGS and only change -march, you will not get FP instructions unless you use FP in the code. Also, I'd be very interested to see *any* optimization that would somehow transform integer code to FP code (note that SIMD is not FP and is perfectly fine in the kernel.) In fact, optimizers tend to transform FP into SIMD, at least on x86 (and other architectures that have fast SIMD instructions.) If I inspect the generated assembly from GCC or Clang, I cannot find FP anywhere, even for code using float and double operations. They get converted to SIMD on modern CPUs (unless you specify a compiler flag that tells it to use the FPU, for example if you need 80-bit extended precision, which is supported by the x86 FPU.)
[gentoo-user] Re: CFLAGs for kernel compilation
Andrew Savchenko bircoph at gentoo.org writes: I can hardly imagine that otherwise the compiler converts integer or pointer arithmetic into floating point arithmetics, or is this really the case for certain flags? If yes, why should these flags *ever* be useful? I mean: The context switching happens for non-kernel code as well, doesn't it? First off, reading this thread, I cannot really tell what the intended use of the the highly tuned kernels is to be. For almost all workstation and server proposes, what has been previously stated is mostly correct. If you really want test these waters, do it on a system that is not in your critical path. You tune and experiment, you are going to bork your box. Water coolers on the CPUs is a good idea when taxing FPU and other simd hareware on the CPU, imho. sys-power/Powertop is your friend. Yes, context switching happens for all code and have its costs. But for userspace code context switching happens for many other reasons, e.g. on each syscall (userspace - kernelspace switching). Also some user applications may need high precision or context switching pays off due to mass parallel data processing, e.g. SIMD instructions in scientific or multimedia applications. ( Here here, I knew we had an LU expert int he crowd. Most scientific or highly parallelized number cruncing does benefit from experimenting with settings and *profiling* the results (trace-cdm + kernelshark) are in portage and are very useful for analysis of hardware timings, context switching and a myriad of other issues. Be careful, you can sink a lifetime into such efforts with little to show for your efforts. The best thing is to read up on specific optimizations for specific codes as vetted by the specific hardware in your processors. Tuning for one need will most likely retard other types of performances; that is why before you delve into these waters, you really need to learn about profiling both target (applicattion) and kernel codes, *BEFORE* randomly tuning the advanced numerical intricacies of your hardware resources. Start with memory and cgroups before worrying about the hardware inside your processors (cpu and gpu). But unless special conditions mentioned above, fixed point is still faster in userspace, some ffmpeg codecs have both fixed and floating point implementations, you may compare them. Programming in fixed point is much harder, so most people avoid it unless they have a very goode reason to use it. And dont't forget that kernel is performance critical unlike most of userspace applications. Video (mpeg, h.264 and such) massively benefits from the enhanced matrix abilities of the simd hardware in your video card's GPU. These bare metal resources are being integrated into gcc-5.1+ for experimentation. But, it is likely going to take a year or so before ordinary users of linux resources see these performance gains. I would encourage you to experiment, but *never on your main workstation*. I'm purchasing a new nvidia video card just to benchmark and tune some numerically intesive codes that use sci-libs/magma. Although this will be my currently fastest video card, it will sit in a box that not used for visual eye candy (gaming, anime, ray_traces etc). The mesos clustering codes (shark, storm, tachyon etc) and MP(I) codes are going to fundamentally change the numerical processing landscape for even small linux clusters. An excellent bit of code to get your feet_wet is sys-apps/hwloc. More than FPU, MP(I) {sys-cluster/openmpi} and other clustering codes are going to allow you to use the DDR(4|5) memory found in many video cards (GPU) via *RDMA*. The world is rapidly changing and many old fixed point integer folks do not see the Tsunami that is just off_shore. Many computationally expensive codes have development project to move to an in-memory [1] environment where HD resources are avoided as much as possible in a cluster environment. Clustered resources tuned for such things as a video rendering farm, will have very different optimized kernels than your KDE(G*) workstation or web server. medica-gfx/Blender is another excellent collection of codes that benefits from all sorts of tuning on a special_purpose system. So do you really have a valid need to tune the FPU performance due to a numerically demanding applications? YMMV Best regards, Andrew Savchenko hth, James [1] https://amplab.cs.berkeley.edu/
Re: [gentoo-user] Re: CFLAGs for kernel compilation
On Fri, 1 May 2015 05:09:51 + (UTC) Martin Vaeth wrote: Andrew Savchenko birc...@gentoo.org wrote: That's why kernel makes sure that no floating point instructions sneaks in using CFLAGS, you may see a lot of -mno-${intrucion_set} flags when running make -V. So it should be sufficient that the kernel does not use float or double, shouldn't it? No. Optimizer paths may be very unobvious, i.e. I'll not be surprised if under some conditions vectorizer may use float instructions for int code. I can hardly imagine that otherwise the compiler converts integer or pointer arithmetic into floating point arithmetics, or is this really the case for certain flags? If yes, why should these flags *ever* be useful? I mean: The context switching happens for non-kernel code as well, doesn't it? Yes, context switching happens for all code and have its costs. But for userspace code context switching happens for many other reasons, e.g. on each syscall (userspace - kernelspace switching). Also some user applications may need high precision or context switching pays off due to mass parallel data processing, e.g. SIMD instructions in scientific or multimedia applications. But unless special conditions mentioned above, fixed point is still faster in userspace, some ffmpeg codecs have both fixed and floating point implementations, you may compare them. Programming in fixed point is much harder, so most people avoid it unless they have a very goode reason to use it. And dont't forget that kernel is performance critical unlike most of userspace applications. Best regards, Andrew Savchenko pgpmtvztAOVCW.pgp Description: PGP signature
[gentoo-user] Re: CFLAGs for kernel compilation
Andrew Savchenko birc...@gentoo.org wrote: That's why kernel makes sure that no floating point instructions sneaks in using CFLAGS, you may see a lot of -mno-${intrucion_set} flags when running make -V. So it should be sufficient that the kernel does not use float or double, shouldn't it? I can hardly imagine that otherwise the compiler converts integer or pointer arithmetic into floating point arithmetics, or is this really the case for certain flags? If yes, why should these flags *ever* be useful? I mean: The context switching happens for non-kernel code as well, doesn't it?
[gentoo-user] Re: CFLAGs for kernel compilation
On 29/04/15 16:35, Holger Hoffstätte wrote: On Wed, 29 Apr 2015 15:18:23 +0200, Ralf wrote: Damn, you're absolutely right. I just tested it using make V=1. kernel make does override CFLAGs from the outside. But that's interesting: my processor supports -march=core-avx2 and none of the linux kernel processor family uses this flag... https://github.com/graysky2/kernel_gcc_patch This is already applied when enabling the experimental USE flag. At least, that's what the docs claim: $ equery uses gentoo-sources
[gentoo-user] Re: CFLAGs for kernel compilation
On 30/04/15 02:52, Nikos Chantziaras wrote: On 29/04/15 16:35, Holger Hoffstätte wrote: On Wed, 29 Apr 2015 15:18:23 +0200, Ralf wrote: Damn, you're absolutely right. I just tested it using make V=1. kernel make does override CFLAGs from the outside. But that's interesting: my processor supports -march=core-avx2 and none of the linux kernel processor family uses this flag... https://github.com/graysky2/kernel_gcc_patch This is already applied when enabling the experimental USE flag. At least, that's what the docs claim: $ equery uses gentoo-sources However, I just checked and it's not being applied. So either the documentation is wrong, or the ebuild/eclass has a bug.
[gentoo-user] Re: CFLAGs for kernel compilation
On Wed, 29 Apr 2015 15:18:23 +0200, Ralf wrote: Damn, you're absolutely right. I just tested it using make V=1. kernel make does override CFLAGs from the outside. But that's interesting: my processor supports -march=core-avx2 and none of the linux kernel processor family uses this flag... https://github.com/graysky2/kernel_gcc_patch -h