Andrew Savchenko <bircoph <at> gentoo.org> writes:
> > I can hardly imagine that otherwise the compiler converts integer > > or pointer arithmetic into floating point arithmetics, or is > > this really the case for certain flags? If yes, why should these > > flags *ever* be useful? > > I mean: The context switching happens for non-kernel code as well, > > doesn't it? First off, reading this thread, I cannot really tell what the intended use of the the "highly tuned" kernels is to be. For almost all workstation and server proposes, what has been previously stated is mostly correct. If you really want test these waters, do it on a system that is not in your critical path. You tune and experiment, you are going to bork your box. Water coolers on the CPUs is a good idea when taxing FPU and other simd hareware on the CPU, imho. sys-power/Powertop is your friend. > Yes, context switching happens for all code and have its costs. But > for userspace code context switching happens for many other > reasons, e.g. on each syscall (userspace <-> kernelspace switching). > Also some user applications may need high precision or context > switching pays off due to mass parallel data processing, e.g. SIMD > instructions in scientific or multimedia applications. ( Here here, I knew we had an LU expert int he crowd. Most scientific or highly parallelized number cruncing does benefit from experimenting with settings and *profiling* the results (trace-cdm + kernelshark) are in portage and are very useful for analysis of hardware timings, context switching and a myriad of other issues. Be careful, you can sink a lifetime into such efforts with little to show for your efforts. The best thing is to read up on specific optimizations for specific codes as vetted by the specific hardware in your processors. Tuning for one need will most likely retard other types of performances; that is why before you delve into these waters, you really need to learn about profiling both target (applicattion) and kernel codes, *BEFORE* randomly tuning the advanced numerical intricacies of your hardware resources. Start with memory and cgroups before worrying about the hardware inside your processors (cpu and gpu). > But unless special conditions mentioned above, fixed point is still > faster in userspace, some ffmpeg codecs have both fixed and floating > point implementations, you may compare them. Programming in fixed point > is much harder, so most people avoid it unless they have a very > goode reason to use it. And dont't forget that kernel is > performance critical unlike most of userspace applications. Video (mpeg, h.264 and such) massively benefits from the enhanced matrix abilities of the simd hardware in your video card's GPU. These bare metal resources are being integrated into gcc-5.1+ for experimentation. But, it is likely going to take a year or so before ordinary users of linux resources see these performance gains. I would encourage you to experiment, but *never on your main workstation*. I'm purchasing a new nvidia video card just to benchmark and tune some numerically intesive codes that use sci-libs/magma. Although this will be my currently fastest video card, it will sit in a box that not used for visual eye candy (gaming, anime, ray_traces etc). The mesos clustering codes (shark, storm, tachyon etc) and MP(I) codes are going to fundamentally change the numerical processing landscape for even small linux clusters. An excellent bit of code to get your feet_wet is sys-apps/hwloc. More than FPU, MP(I) {sys-cluster/openmpi} and other clustering codes are going to allow you to use the DDR(4|5) memory found in many video cards (GPU) via *RDMA*. The world is rapidly changing and many old "fixed point integer" folks do not see the Tsunami that is just off_shore. Many computationally expensive codes have development project to move to an "in-memory" [1] environment where HD resources are avoided as much as possible in a cluster environment. Clustered resources "tuned" for such things as a video rendering farm, will have very different optimized kernels than your KDE(G*) workstation or web server. medica-gfx/Blender is another excellent collection of codes that benefits from all sorts of tuning on a special_purpose system. So do you really have a valid need to tune the FPU performance due to a numerically demanding applications? YMMV > Best regards, > Andrew Savchenko hth, James [1] https://amplab.cs.berkeley.edu/