Andrew Savchenko <bircoph <at> gentoo.org> writes:

> > I can hardly imagine that otherwise the compiler converts integer
> > or pointer arithmetic into floating point arithmetics, or is
> > this really the case for certain flags?  If yes, why should these
> > flags *ever* be useful?
> > I mean: The context switching happens for non-kernel code as well,
> > doesn't it?


First off, reading this thread, I cannot really tell what the intended use
of the the "highly tuned" kernels is to be. For almost all workstation
and server proposes, what has been previously stated is mostly correct. If
you really want test these waters, do it on a system that is not in your
critical path. You tune and experiment, you are going to bork your box.
Water coolers on the CPUs is a good idea when taxing FPU and other simd
hareware on the CPU, imho. sys-power/Powertop is your friend.


> Yes, context switching happens for all code and have its costs. But
> for userspace code context switching happens for many other
> reasons, e.g. on each syscall (userspace <-> kernelspace switching).
> Also some user applications may need high precision or context
> switching pays off due to mass parallel data processing, e.g. SIMD
> instructions in scientific or multimedia applications. 

 (
Here here, I knew we had an LU expert int he crowd. Most scientific
or highly parallelized number cruncing does benefit from experimenting
with settings and *profiling* the results (trace-cdm + kernelshark)
are in portage and are very useful for analysis of hardware timings,
context switching and a myriad of other issues. Be careful, you can
sink a lifetime into such efforts with little to show for your efforts.
The best thing is to read up on specific optimizations for specific
codes as vetted by the specific hardware in your processors. Tuning for
one need will most likely retard other types of performances; that is
why before you delve into these waters, you really need to learn about
profiling both target (applicattion) and kernel codes, *BEFORE* randomly
tuning the advanced numerical intricacies of your hardware resources.
Start with memory and cgroups before worrying about the hardware inside
your processors (cpu and gpu).


> But unless special conditions mentioned above, fixed point is still 
> faster in userspace, some ffmpeg codecs have both fixed and floating 
> point implementations, you may compare them. Programming in fixed point
> is much harder, so most people avoid it unless they have a very
> goode reason to use it. And dont't forget that kernel is
> performance critical unlike most of userspace applications.

Video (mpeg, h.264 and such) massively benefits from the enhanced matrix
abilities of the simd hardware in your video card's GPU. These bare metal
resources are being integrated into gcc-5.1+ for experimentation. But,
it is likely going to take a year or so before ordinary users of linux
resources see these performance gains.  I would  encourage you
to experiment, but *never on your main workstation*. I'm purchasing
a new nvidia video card just to benchmark and tune some numerically
intesive codes that use sci-libs/magma. Although this will be my
currently fastest video card, it will sit in a box that not used
for visual eye candy (gaming, anime, ray_traces etc).


The mesos clustering codes (shark, storm, tachyon etc) and MP(I) codes are
going to fundamentally change the numerical processing landscape for even
small linux clusters. An excellent bit of code to get your feet_wet is
sys-apps/hwloc. More than FPU, MP(I)  {sys-cluster/openmpi} and other
clustering codes are going to allow you to use the  DDR(4|5) memory found in
many video cards (GPU) via *RDMA*. The world is rapidly changing and many
old "fixed point integer" folks do not see the Tsunami that is just
off_shore. Many computationally expensive codes have development project to
move to an "in-memory" [1] environment where  HD resources are avoided as
much as possible in a cluster environment. Clustered resources "tuned" for
such things as a video rendering farm, will have very different optimized
kernels than your KDE(G*) workstation or web server. medica-gfx/Blender is
another excellent collection of codes that benefits from all sorts of tuning
on a special_purpose system.

So do you really have a valid need to tune the FPU performance due to a
numerically demanding applications?       YMMV

> Best regards,
> Andrew Savchenko


hth,
James

[1] https://amplab.cs.berkeley.edu/



Reply via email to