---------- Forwarded message ---------
From: G 3 <programmingk...@gmail.com>
Date: Wed, Mar 4, 2020 at 1:35 PM
Subject: Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
To: BALATON Zoltan <bala...@eik.bme.hu>




On Mon, Mar 2, 2020 at 6:16 PM BALATON Zoltan <bala...@eik.bme.hu> wrote:

> On Mon, 2 Mar 2020, Richard Henderson wrote:
> > On 3/2/20 3:42 AM, BALATON Zoltan wrote:
> >>> The "hardfloat" option works (with other targets) only with ieee745
> >>> accumulative exceptions, when the most common of those exceptions,
> inexact, has
> >>> already been raised.  And thus need not be raised a second time.
> >>
> >> Why exactly it's done that way? What are the differences between IEEE FP
> >> implementations that prevents using hardfloat most of the time instead
> of only
> >> using it in some (although supposedly common) special cases?
> >
> > While it is possible to read the host's ieee exception word after the
> hardfloat
> > operation, there are two reasons that is undesirable:
> >
> > (1) It is *slow*.  So slow that it's faster to run the softfloat code
> instead.
> > I thought it would be easier to find the benchmark numbers that Emilio
> > generated when this was tested, but I can't find it.
>
> I remember those benchmarks too and this is also what the paper Alex
> referred to also confirmed. Also I've found that enabling hardfloat for
> PPC without doing anything else is slightly slower (on a recent CPU, on
> older CPUs could be even slower). Interetingly however it does give a
> speedup for vector instructions (maybe because they don't clear flags
> between each sub operation). Does that mean these vector instruction
> helpers are also buggy regarding exceptions?
>

I am all intrigued by these vector instructions. Apple was really big on
using them back in the day so programs like Quicktime and iTunes definitely
use them. I'm not sure if the PowerPC's altivec vector instructions map to
host vector instructions already, but if they don't, mapping them would
give us a huge speedup in certain places. Would anyone know if this was
already done in QEMU?

Reply via email to