Bug#575158: dpkg: Add new 'e500' architecture to triplettable and ostable

Sebastian Andrzej Siewior Sun, 18 Apr 2010 05:57:19 -0700

* Guillem Jover | 2010-04-16 09:01:16 [+0200]:

>Hi!
Hi Guillem,


>On Thu, 2010-02-18 at 11:38:34 +0100, Sebastian Andrzej Siewior wrote:
>> - variant two: a operation like a + b where we call in a library to
>>   compute the floating point operation. Here we would put the
>>   computation itself into a library like glibc/gcc which would use
>>   classic or embedded floating point depending on hwcap. Again the problem
>>   how do pass the arguments. Plus we don't utilize all registes and have
>>   function calls for every "simple" operation. Not only that we have a
>>   new ABI here we also make it slow for every one.
>
>For this variant, it seems to me, the only sane way would be to use soft
>float ABI, by default make gcc use -mfloat-abi=soft, then build
>specialized hwcap versions of libgcc, libm, libc, and similar for classic
>and embedded fp with -mfloat-abi=softfp. So you'd get a different ABI
>than the current powerpc port, but at the same time this new port could
>be used everywhere.
This has been done on armel. I took a look on the GCC manual and I cam
find this options only in the ARM section and my compiler gives me an
unknown command error. Lets assume for one moment that it also is
supported on powerpc.

>The downsides would be AFAICS:
>
> * Slight overhead (how much?) due to function calls for fp operations,
>   and move of values from fpr to gpr on classic fp.
Lets get some numbers on this: I grabbed nbench-byte 2.2.3 from [0]
and compiled it differently on a e500v2 based box:
- "normal" => this is e500v1 compatible code. So an add of two double
  variables ends up in calling __adddf3() which performs the operation
  without using any e500v2 opcodes.
- "e500v2 func" => like "normal" but __adddf3() for instance is using
  e500v2 opcodes. I just made a function with that name which overrides
  the pure soft one. sin(), pow() and friends from libm are untouched so
  all double floating point operations there are inlined.
- "e500v2" => here we inline all double operations.

The complete results are at [1] and the tiny override I used for "e500v2
func" is at [2]. Here I removed the integer only tests and you see the
average of all runs from the "Iterations/sec." column:

 TEST                 | e500v2  | e500v2 funcs |  normal
 ---------------------|---------|--------------|---------
 FOURIER              | 5220.30 |      4800.43 | 4017.13
 NEURAL NET           |    8.76 |         1.61 |    0.69
 LU DECOMPOSITION     |  287.10 |        50.54 |   19.21
 FLOATING-POINT INDEX |   10.75 |         3.33 |    1.71

If we take "FLOATING-POINT INDEX" for comparing and take "e500v2" as
base then "e500v2 funcs" perform at ~31% of the original and "normal" at
~16%. Based on this numbers floating point intensive numbers application
will perform bad.
Unfortunately I don't have a "classic" powerpc around and coming with a
new compiler for a test like this would eat at least a day. I expect it
be worse than "e500v2" => "e500v2 funcs" because all 32 FPRs remain
totally unused in the program. Keeping floating point numbers in GPRS
leaves less room for others things like pointers forcing them onto
stack. The powerpc soft float ABI defines for the type double to use two
GPRs. In hard float this type fits in one FPR so this makes the
situation kinda worse because if the compiler wants to save variable in
a register during a function call it needs two registers.

> * On generic code (one not built specifically for e500), half of the
>   gpr would not get used.
Generic code with this ABI on a G3 for instance uses all GPRs but _no_
FPRs. The Power arch defines 32 GPRs registers and 32 FPRs.

>The upsides would be:
>
> * Code should be ABI comptatible. So one could actually rebuild the
>   arch for e500 only, if desired, and it would still be ABI compatible.
>   In the same way one can rebuild the i386 port for a Celeron, and it
>   should be ABI compatible, even if it will not work on older systems.
This is kinda true. If you use this on a embedded machine with no hard
disk it is unlikely you recompile it. If your CPU is slow or your
resources are limited you probably don't do it. + you have to trace
stable changes yourself.

> * If the performance is not too bad, it could even be considered to
>   replace the current powerpc architecture? (obviously after
>   discussion with the porters, etc)
I don't think that this will happen as I already pointed out possible
performance loss. Additionally:
* binary only code will no longer work with this new ABI. If the vendor
  does not recompile its program it will no longer work on Debian and
  people which rely on this particular piece of software have to change
  the distribution.
* assembly optimized code which takes floating point arguments has to be
  adjusted. (Not that big argument but worth to mention).

> * Native implementations of fp code would be used for either, and no
>   emulation by the kernel would be needed, not even on FPU-less
>   PowerPCs.
Yes that is true. However I don't think a new port for FPU-less machines
is a big thing: changes to package which call foreign languages and are
not using libffi have to be solved once. Packages in the category code
generator have to be touched anyway. So what is left is to automatically
recompile packages and this is a matter of enough buildds and disk space
on the archive.

>Do you see this as a possible workable solution, or is it completely
>unnacceptable? Did I miss something besides what I listed here?
I don't think it is acceptable due to the points I've added.

>Anyway if case the previous is nuts/suboptimal/unworkable/etc, here's
>the comments on the architecture name:
>
>On Thu, 2010-04-08 at 18:39:09 -0500, Moffett, Kyle D wrote:
>> >>  * The only chipset families that support "SPE" instructions are:
>> >>    * PowerPC e200
e200z3 and e200z6 according to [3].

>> >>    * PowerPC e500v1
>> >>    * PowerPC e500v2
>> >> 
>> >>  * The incompatibility between various SPE-capable CPUs mean that an arch
>> >> spec of "spe" or "powerpcspe" is probably insufficiently descriptive.
>> >
>> > Yes, "probably". Right now we don't see any.
>
>> >>  * The "e200" processor series is an automotive processor and has
>> >> insufficient storage to run even something like Emdebian Crush, let alone 
>> >> to
>> >> be able to build anything on its own.  It should therefore be excluded 
>> >> from
>> >> our discussion.  This means we just care about e500v{1,2} cores.
>
>Well, someone could get e200 licensed and build something generic
>enough to run Debian on it at some point, no?
Yes this is possible but I don't think so. However point: I looked at
the PowerISA and the opcodes are described in the SPE section. The Cell
SPE is not mentioned there. So one could license the SPE part and
attach it to a 440 based core which has an APU interface. Or build his
own core with this capability like Lemote did with MIPS.

>> > Right. The spec says, that e200z4 and e200z6 are binary compatible with
>> > e500. However, they also mention that double precision can only be
>> > achieved in software. So this looks like double precision opcodes result
>> > in an invalid opcode and we have to emulate them in kernel. This counts
>> > as binary compatible I guess.
>
>Exactly, compatibility here is a tricky word, for Debian architectures
>it tends to imply mostly compatible ABI (regarding instruction set,
>binary object format, calling conventions, kernel interface, etc).
>Well what the GNU triplet implies, actually.
>
>Regarding the CPU, as long as later CPUs are mostly backward
>compatible, and the kernel can abstract other differences from
>the system it should be fine, and using the same architecture is
>preferable in general.
Okay.

>> >>  * Freescale has indicated that they will not be building any more chipset
>> >> families including the SPE instructions, so we don't have to worry about 
>> >> any
>> >> newer chipset families.
>
>> >>  * We can't tell exactly how common or uncommon the e500v1 chipsets are
>> >> because Freescale's chipset comparison tables all just say "e500" without
>> >> referring to the version.  As a result, we should probably be safe rather
>> >> than sorry and refer to the version in the arch name (IE: e500v1/e500v2).
>
>> >>  * We should just call it just "e500v2":
>> >>    * Sufficiently descriptive of the hardware architecture
>
>I don't really see why the other ones should be left out, using a
>specific implementation to describe all the possibly supported
>implementations the architecture can handle seems wrong to me. In this
>case the describing attribute is the usage of SPE, which is what makes
>it "incompatible" from the standard powerpc port, and it's what's
>already on the GNU triplet, which would change accordingly in case an
>incompatible change to it would happen.
True.

>So if this is considered the way to go, I think using spe in the name
>would be better, which makes it generic, and kind of more future-proof
>than e500, and for the long name argument, using ppc should be fine, in
>the same way we have already ppc64. But then if you'd prefer powerpc
>that be fine with me too.
So we back with powerpcspe which is fine with me. I was only afraid of
mixing it up with the CELL's SPE. Now that Sony discontinued OtherOs for
PS3 it should no longer be a problem :)

>Anyway, to be clear, I'm not trying to be imposing, you are the porters
>afterall, and the ones who will have to do the heavy lifting, just trying
>to get the facts right, as deciding on an architecture name, more so when
>it does not seem obvious, should be considered carefully, as having to
>change the name later on it's only going to be painful, more so if
>deployed systems have to be switched.
Yes. That's why I am here :)

So we agree on powerpcspe and the port will contain the complete SPE
extension including double precision support. If one needs a subset
he/she can pick a new name which denotes this and recompile. The only
thing that has to be touched are code generators and only if they are
required. So this is just a matter of enough compile HW.

>thanks,
>guillem

[0] http://www.tux.org/~mayer/linux/bmark.html
[1] http://download.breakpoint.cc/nbench/nbench-runs.txt
[2] http://download.breakpoint.cc/nbench/float.c
[3] http://www.ip-extreme.com/IP/power_e200.shtml

Sebastian



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#575158: dpkg: Add new 'e500' architecture to triplettable and ostable

Reply via email to