Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-03-06 Thread Luke Kenneth Casson Leighton
On Saturday, March 6, 2021, Riccardo Mottola 
wrote:
> Hi Luke,
>
>
> Luke Kenneth Casson Leighton wrote:
>>
>> just to confirm: that's definitely "setting machine to capabilities that
the machine doesn't have, then requesting that feature and gcc 10 says
'ok'" yes?
>>
>> i do not know the exact machine, let us assume it is -mg3.
>>
>> the options being passed are "gcc -mg3 -maltivec" and this should
definitely cause gcc to raise an error, is that correct?
>
> that is what the current test written by Adrian does, but I don't think
it is the best solution.

right.  ok.  so by "autoconf" test i meant creating an actual program (even
if it is a one line assembly file) and executing it.

of course that relies on native building which in debian is the default,
but, argh i just realised that "native build host" in this case will be an
IBM POWER9 system which is effectively a cross compile scenario (similar to
using aarch64 to build armhf). unless the Program Compatibility Register is
set and that... wouldn't work either

argh! :)

> So I think the safest bet still would be a hard switch to enable/disable
AltiVec builds.

yes i concur, i would however still consider this to be a bug in gcc (apart
from the 750 with/without altivec) if gcc is not excluding combinations for
which there is no known hardware.

sigh why on earth this was not placed behind dynamic runtime libraries a
long time ago, i do not fully understand.

l.


-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-03-06 Thread Riccardo Mottola

Hi Luke,


Luke Kenneth Casson Leighton wrote:


just to confirm: that's definitely "setting machine to capabilities 
that the machine doesn't have, then requesting that feature and gcc 10 
says 'ok'" yes?


i do not know the exact machine, let us assume it is -mg3.

the options being passed are "gcc -mg3 -maltivec" and this should 
definitely cause gcc to raise an error, is that correct?


that is what the current test written by Adrian does, but I don't think 
it is the best solution.


Whould we really get an error? In the case of the g3 I don't think so, 
strictly speaking.


I did test
-mcpu=750 -mtune=750 -maltivec

And I do not get an error. However, CPUs with a 750 core and altivec do 
exist, even if they were not officially mounted in Mac, they were used 
elsewhere and perhaps upgrade boards exist (PPC 750 VX).

I might test with cores that impossibly can have AltiVec, like G2 cores

So I think the safest bet still would be a hard switch to enable/disable 
AltiVec builds.


Riccardo



Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-03-03 Thread Luke Kenneth Casson Leighton
On Tuesday, March 2, 2021, Luke Kenneth Casson Leighton 
wrote:
>
>
> On Tuesday, March 2, 2021, Riccardo Mottola 
wrote:
>
>> actually the original point is even for PPC32, note just PPC64. The
>> configure check added by Adrian in Firefox checks if the compiler
>> accepts -maltivec and just enables it in the build.
>> However, this assumption is not correct and causes issues as explained
>> in my previous mail.
>
> ouch.  seems like an autoconf test is needed, at least.  and an upstream
bugreport to gcc.
>
> just to confirm: that's definitely "setting machine to capabilities that
the machine doesn't have, then requesting that feature and gcc 10 says
'ok'" yes?
>
> i do not know the exact machine, let us assume it is -mg3.
>
> the options being passed are "gcc -mg3 -maltivec" and this should
definitely cause gcc to raise an error, is that correct?

or, is it:

* just -mnoaltivec
* no specific setting of machine type
* VMX instructions still get introduced

whilst i do not know if gcc rejects inline VSX assembly if -mnoaltivec is
given, i have a sneaking suspicion that this could be something not to do
with gcc itself but with e.g. recent libc6 proliferation of inline assembly
variants of functions such as strncpy.

are you able to send a gdb stacktrace here to the list and also a disasm
dump at the PC showing which instruction is being attempted?

this will tell us what function is going awry and we can then ping the
right people.

l.




-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-03-02 Thread Luke Kenneth Casson Leighton
On Tuesday, March 2, 2021, Riccardo Mottola 
wrote:

> actually the original point is even for PPC32, note just PPC64. The
> configure check added by Adrian in Firefox checks if the compiler
> accepts -maltivec and just enables it in the build.
> However, this assumption is not correct and causes issues as explained
> in my previous mail.

ouch.  seems like an autoconf test is needed, at least.  and an upstream
bugreport to gcc.

just to confirm: that's definitely "setting machine to capabilities that
the machine doesn't have, then requesting that feature and gcc 10 says
'ok'" yes?

i do not know the exact machine, let us assume it is -mg3.

the options being passed are "gcc -mg3 -maltivec" and this should
definitely cause gcc to raise an error, is that correct?

l.




-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


SIMD not present in Libre/Open hardware OpenPOWER implementations [was Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)]

2021-03-02 Thread Luke Kenneth Casson Leighton
changing subject, for reference / background:

* Paul Mackerras is working on an experimental branch to add VSX
 https://github.com/paulusmack/microwatt/blob/vecvsx/decode1.vhdl
 he was experimenting to see what was needed to get Fedora booting.  the
internal design is a Finite State Machine.  multiple clocks per instruction
(due to internal 64 bit pathways)

* neither A2I nor A2O have VSX and an estimate for adding it to these
high-performance gate-level designs would be around 2 years
https://github.com/openpower-cores/a2o
https://github.com/openpower-cores/a2i

* LibreSOC we are just not going to add VSX. the development cost is far
too high, the performance nowhere near that of Vectors, software complexity
far too high and L1 cache usage is compromised.
http://git.libre-soc.org

all of these designs - all four - have internal 64 bit pathways.  OpenPOWER
instruction decoding is complex even without SIMD (4,000 gates) and adding
VSX multiplies that by three or four.  that's enough gates to do a decent
embedded RISC core in any other RISC ISA.

IBM had years in which to incrementally extend SIMD operations.  Jeffrey in
another post kindly outlines the progression.

now, at POWER10 with 18 billion transistors, the barrier to entry is so
high that if someone doesn't put their foot down and say "no" to SIMD there
isn't going to *be* any new OpenPOWER hardware other than that from IBM
[not, at least, capable of running standard ppc64le distros that is]

we seriously considered doing an entirely new ppc64le-eabi-1.5/1.9 Debian
Distro port at one point, going *backwards* to the time when SIMD was not
mandatory but doing LE rather than BE, but the risk of it being viewed in
the same way as "rasbian" is too great.


On Tuesday, March 2, 2021, Riccardo Mottola 
wrote:

>
> Emulation at kernel level is painfully slow,

seriously, i kid you not, it is infinitely better than trying to implement
VSX in hardware.  we would spend so long implementing it that it would
delay LibreSOC *beyond* the point where money from NLnet was available,
jeapordising the entire project in the process.

given a choice between "painfully slow right now but fixable in software
later" and "completely destroying any chance of completing and delivering
even any hardware at all" it's hardly a choice :)

the Cray-style Vectorisation being added will smoke SIMD in the long run,
once the ABIs and compilers are sorted.


> yes enabling runtime libraries could be done, requires extensive work in
> upstream code.

this is a better situation than an entire new distro port.  we may have to
have one anyway: all timescale estimates which start from defining a new
triplet and going from there are around 3-5 years.

if a new EABI has to be defined and spec'd as well it's even longer.

> An easier version is the path that TenFourFox and other follow: just
> provide two binaries, which is what I intend to do with ArcticFox.
> However if Debian wants to come up with the pain of two (or more?) FF
> packages

deep breath, this may be a sane medium term solution.  long term the
separation of SIMD is needed behind dynamic loadable libraries (and HWCAPS
in glibc6) rather than assuming it is 100% guaranteed available.

LibreSOC in particular needs to appear to go *backwards* in terms of
performance before it can go forwards, once the Cray-style Vectorisation
hits gcc properly.

then other hardware can also do variants of the same libraries (including
POWER9/10).


On Tuesday, March 2, 2021, Jeffrey Walton  wrote:

> Based on my experience with Botan and Crypto++... VSX is available
> with POWER7 and -mvsx compiler option. VSX is part of POWER8 core and
> does not need a compiler option.

as demonstrated by A2O/I in particular there is unfortunately a problem
with referring to IBM proprietary processor names as the canonical
definition of available features: A2I/O are Power v2.06/7 compliant but
*still do not have VSX*.

this is because the feature is optional except by the time you get to the
AIX Compliancy Subset.  see v3.0C or v3.1 first few pages, copy easily
available at http://ftp.libre-soc.org

note that it really does say "SIMD is optional" for Linux/UNIX subset,
Floating Point Embedded subset and Fixed Point subset.

many people misinterpret / misread that document including myself for
several months.

this conflation is caused by the fact that only IBM processors, which
happen to go by proprietary names POWER7-10, are commonly available.  NXP
Quorl, not so well known, which is v2.08B compliant, used in the PowerPC
Notebook, is going EOL.


> VSX is a lot like Intel tic/toc features. VSX allows a 64-bit vector
> loads and stores, but it does not provide operations on 64-bit
> vectors. You have to use POWER8 to get the 64-bit add (addudm),
> subtract (subudm), etc.

this illustrates very nicely the progression over time (many years) as the
team inside IBM ramped up the capabilities.

we can see very unfortunately that they too were seduced by what SIMD says

Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-03-02 Thread Riccardo Mottola
Hi Luke,


Luke Kenneth Casson Leighton wrote:
>
> 1.5 and 1.9 never had SIMD / VMX / VSX so there shouldn't be a problem
> (for G5).
>
> which, coming back to the original question, i'm not seeing a reason
> why disabling altivec should not compile.
>
> unless, of course, there have been assumptions "#ifdef PPC64 equals
> POWER9 therefore VSX" which are unfortunately creeping in ever since
> EABI 2.0 came about?

actually the original point is even for PPC32, note just PPC64. The
configure check added by Adrian in Firefox checks if the compiler
accepts -maltivec and just enables it in the build.
However, this assumption is not correct and causes issues as explained
in my previous mail.

Riccardo



Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-03-02 Thread Riccardo Mottola
Hi Gabriel,


Gabriel Paubert wrote:
> This is going to be hopelessly slow. The point of SIMD is to process
> quickly vast amounts of data, the overhead of every single emulated
> instructions is counted in hundreds of cycles.
>
> IMHO, the only solution is to:
> a) only use SIMD in library code
> b) compile 2 or 3 versions of libraries: no SIMD, VMX and/or VSX
> c) put each library in a different directory
> d) at run time, select the path to load the libraries from CPU
>capabilities

Emulation at kernel level is painfully slow, like FPU emulation - while
here you want maximum speed for that code. Already not having vector
instructions is a penalty, but an optimized build can still be usable.

yes enabling runtime libraries could be done, requires extensive work in
upstream code.

An easier version is the path that TenFourFox and other follow: just
provide two binaries, which is what I intend to do with ArcticFox.
However if Debian wants to come up with the pain of two (or more?) FF
packages

Riccardo



Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-03-02 Thread Riccardo Mottola
Hi,

Riccardo Mottola wrote:
>
> This causes a big issue if you want to compile a non-Altivec build on
> a capable processor like the G4: it will automatically enabled even if
> you don't want.
> E.g. if I want to build on a G4 a binary working on the G3, I can't. I
> specify -mcpu=750 -mtune=750, but the compiler will accept -maltivec
> and create an incompatible binary. 

I actually just tested and "debian shipped firefox" fires an illegal
instruction on a G3, so it is having a similar issue than my own
ArcticFox builds.

I also did a test and tried to compile on my G3 (and yes, it is an old
iBook which has a classic non-altivec 750 PPC, GX if I am right) and
-maltivec gets just accepted by gcc 10 now.

So, essentially Adrian's patch is wrong in concept: you cannot use the
compiler even on a native CPU to test for altivec

This means double-issue: using a higher-spec'd CPU cannot be used to
compile for a lower-spec, but also a native build will fail.

I would try to follow to possibilities:
- just a "hard switch" like --enable-altivec --disable-altivec for PCP
- try to parse the optimization flas one might extra use, so if the
compiler is invoked with "gcc -maltivec" (common practive) a
HAVE_ALTIVEC build is automatically enabled

What do you think?

Riccardo




Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-03-01 Thread Jeffrey Walton
On Mon, Mar 1, 2021 at 3:39 AM Gabriel Paubert  wrote:
>
> On Sun, Feb 28, 2021 at 11:52:12PM +, Luke Kenneth Casson Leighton wrote:
> > On Monday, March 1, 2021, Riccardo Mottola 
> > wrote:
> > ...
> > Tulio Magno Quites Machado Filho is currently working on glibc6 patches
> > which reverse these erroneous assumptions, replacing them with "#ifdef VSX"
> > thus allowing people to compile code that does not rely on SIMD.
>
> Beware that VSX is not Altivec. Altivec was called VMX by IBM and
> VSX is a superset of Altivec (IIRC).

Based on my experience with Botan and Crypto++... VSX is available
with POWER7 and -mvsx compiler option. VSX is part of POWER8 core and
does not need a compiler option.

VSX is a lot like Intel tic/toc features. VSX allows a 64-bit vector
loads and stores, but it does not provide operations on 64-bit
vectors. You have to use POWER8 to get the 64-bit add (addudm),
subtract (subudm), etc.

So a POWER7+VSX 64-bit add might look like:

typedef __vector unsigned intuint32x4_p;
typedef __vector unsigned long long uint64x2_p;

# Load 64-bit vector from uint64_t[2]
uint64x2_p a = vec_ld(...);
uint64x2_p b = vec_ld(...);

# But still perform the 32-bit add
uint64x2_p c = (uint64x2_p )VecAdd64((uint32x4_p)a, (uint32x4_p)b);

And:

uint32x4_p
VecAdd64(const uint32x4_p vec1, const uint32x4_p vec2)
{
// The carry mask selects carry's for elements 1 and 3 and sets
// remaining elements to 0. The result is then shifted so the
// carried values are added to elements 0 and 2.
#if defined(MYLIB_BIG_ENDIAN)
const uint32x4_p zero = {0, 0, 0, 0};
const uint32x4_p mask = {0, 1, 0, 1};
#else
const uint32x4_p zero = {0, 0, 0, 0};
const uint32x4_p mask = {1, 0, 1, 0};
#endif

uint32x4_p cy = vec_addc(vec1, vec2);
uint32x4_p res = vec_add(vec1, vec2);
cy = vec_and(mask, cy);
cy = vec_sld (cy, zero, 4);
return vec_add(res, cy);
}

 A POWER8 add looks as expected:

uint64x2_p
VecAdd64(const uint64x2_p vec1, const uint64x2_p vec2)
{
return vec_add(a, b);
}

Even with the crippled 64-bit add using 32-bit elements, some
algorithms, like Bernstein's ChaCha, runs about 2.5x faster than over
the scalar unit.

Jeff



enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-03-01 Thread Luke Kenneth Casson Leighton
On Monday, March 1, 2021, Gabriel Paubert  wrote:
> On Mon, Mar 01, 2021 at 12:22:22PM +, Luke Kenneth Casson Leighton
wrote:
>> ---
>> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>>
>> On Mon, Mar 1, 2021 at 8:39 AM Gabriel Paubert  wrote:
>>
>> > Beware that VSX is not Altivec. Altivec was called VMX by IBM and
>> > VSX is a superset of Altivec (IIRC).
>> >
>> > G4 and G5 do not have VSX.
>>
>> apologies i tend to lump these together.
>>
>> > This is going to be hopelessly slow.
>>
>> great!  i have absolutely no problem with that, at all.  the idea is
>> to give people access to something where due to the ongoing cascading
>> mistaken assumptions "nobody has any hardware except IBM POWER9 and
>> EABI 2.0 says VSX therefore #ifdef POWER9 --> enable VSX".
>>
>> it's a stopgap measure that at least allows... _something_.  breathing
>> space whilst the OpenPOWER Foundation puts together a plan.
>>
>> > The point of SIMD is to process quickly vast amounts of data,
>>
>> that was its seductive intent.  the reality is very different,
>> poisoning L1 I-Cache through massive bloating of program size, and in
>> some cases actually causing such heavy internal bus contention between
>> instruction and data reads that all processing grinds to a halt.
>>
>> https://www.sigarch.org/simd-instructions-considered-harmful/
>
> This publications claims (and probably rightly) that vector instructions
> are preferable to SIMD, but does not say at all that falling back to
> purely scalar is better.

i appreciate this is a side-track: LibreSOC is introducing a concept of
Cray-style "hardware for-loops" around the scalar ISA.  with gcc
autovectorisation the seemingly-scalar c code becomes as fast as the
hardware has available parallel ALUs.  hence the performance penalty is not
as great.

POWER9 on the other hand, if you've seen the proposed glibc6 patch to add
VSX to e.g. strncpy, it's alarming.  whilst the above article is
hypothetical, the real-world patch is a staggering 250 hand-coded assembly
instructions (the equivalent RVV is 13), dramatically reducing L1 cache
effectiveness and likely interfering with the use of memory bounds checkers
that align memory at the end of pages.

> Also, PPC SIMD has seen fewer variations than x86, which started with
> MMX (64bit), then SSE (128 bit registers, single precision only), SSE2
> (finally able to get rid of the awful x87 stacked registers) and so many
> extensions that I agree that it is impossible to track.

indeed.  all that is gone with Cray-style Vectors.


> Hmmm, G5 is BE only. No way to run LE, G4 and older are 32 bit BE (they
> could run LE also, but it's not easy).
>

understood.  ok so EABI 2.0 is out of the running, and EABI 1.9 is the
64-bit upgrade of 1.5, which is what debian-ppc64 (be) is based on.

1.5 and 1.9 never had SIMD / VMX / VSX so there shouldn't be a problem (for
G5).

which, coming back to the original question, i'm not seeing a reason why
disabling altivec should not compile.

unless, of course, there have been assumptions "#ifdef PPC64 equals POWER9
therefore VSX" which are unfortunately creeping in ever since EABI 2.0 came
about?

l.




-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-03-01 Thread Gabriel Paubert
On Mon, Mar 01, 2021 at 12:22:22PM +, Luke Kenneth Casson Leighton wrote:
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
> 
> On Mon, Mar 1, 2021 at 8:39 AM Gabriel Paubert  wrote:
> 
> > Beware that VSX is not Altivec. Altivec was called VMX by IBM and
> > VSX is a superset of Altivec (IIRC).
> >
> > G4 and G5 do not have VSX.
> 
> apologies i tend to lump these together.
> 
> > This is going to be hopelessly slow.
> 
> great!  i have absolutely no problem with that, at all.  the idea is
> to give people access to something where due to the ongoing cascading
> mistaken assumptions "nobody has any hardware except IBM POWER9 and
> EABI 2.0 says VSX therefore #ifdef POWER9 --> enable VSX".
> 
> it's a stopgap measure that at least allows... _something_.  breathing
> space whilst the OpenPOWER Foundation puts together a plan.
> 
> > The point of SIMD is to process quickly vast amounts of data,
> 
> that was its seductive intent.  the reality is very different,
> poisoning L1 I-Cache through massive bloating of program size, and in
> some cases actually causing such heavy internal bus contention between
> instruction and data reads that all processing grinds to a halt.
> 
> https://www.sigarch.org/simd-instructions-considered-harmful/

This publications claims (and probably rightly) that vector instructions
are preferable to SIMD, but does not say at all that falling back to
purely scalar is better.

Also, PPC SIMD has seen fewer variations than x86, which started with
MMX (64bit), then SSE (128 bit registers, single precision only), SSE2
(finally able to get rid of the awful x87 stacked registers) and so many
extensions that I agree that it is impossible to track. 

At least for PPC until now, it has been 128 bit registers, always.
> 
> 
> > the overhead of every single emulated
> > instructions is counted in hundreds of cycles.
> 
> > IMHO, the only solution is to:
> > a) only use SIMD in library code
> > b) compile 2 or 3 versions of libraries: no SIMD, VMX and/or VSX
> 
> this requires going backwards to EABI 1.5.  EABI 2.0 as currently
> defined *makes SIMD mandatory*.
> 
> given that debian PPC64 is BE EABI 1.5 but PPC64LE is LE EABI 2.0 i
> don't see how that's workable.

Hmmm, G5 is BE only. No way to run LE, G4 and older are 32 bit BE (they
could run LE also, but it's not easy).


> 
> unless you create a new triplet PPC64-LE-using-EABI-1.5
> 

I don't think so, stay with BE.

> also: multilib and it is being ripped out from distros.
> 
> > c) put each library in a different directory
> > d) at run time, select the path to load the libraries from CPU
> >capabilities
> 
> this is multiarch i believe.  it requires, as i recall, a
> syscall-level understanding of the two ABIs.  with ppc64 being BE and
> ppc64le being LE this would require word-order swapping at the syscall
> level.

You have to be BE anyway (kernel and userspace) to support the oldest 64
bit processors. The switch to LE occured during Power7, but I believe
that real official distro support only happened with Power8.  

Locating libraries at program startup is done by ld.so, not by the kernel.


Gabriel
 



Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-03-01 Thread Luke Kenneth Casson Leighton
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Mon, Mar 1, 2021 at 8:39 AM Gabriel Paubert  wrote:

> Beware that VSX is not Altivec. Altivec was called VMX by IBM and
> VSX is a superset of Altivec (IIRC).
>
> G4 and G5 do not have VSX.

apologies i tend to lump these together.

> This is going to be hopelessly slow.

great!  i have absolutely no problem with that, at all.  the idea is
to give people access to something where due to the ongoing cascading
mistaken assumptions "nobody has any hardware except IBM POWER9 and
EABI 2.0 says VSX therefore #ifdef POWER9 --> enable VSX".

it's a stopgap measure that at least allows... _something_.  breathing
space whilst the OpenPOWER Foundation puts together a plan.

> The point of SIMD is to process quickly vast amounts of data,

that was its seductive intent.  the reality is very different,
poisoning L1 I-Cache through massive bloating of program size, and in
some cases actually causing such heavy internal bus contention between
instruction and data reads that all processing grinds to a halt.

https://www.sigarch.org/simd-instructions-considered-harmful/


> the overhead of every single emulated
> instructions is counted in hundreds of cycles.

> IMHO, the only solution is to:
> a) only use SIMD in library code
> b) compile 2 or 3 versions of libraries: no SIMD, VMX and/or VSX

this requires going backwards to EABI 1.5.  EABI 2.0 as currently
defined *makes SIMD mandatory*.

given that debian PPC64 is BE EABI 1.5 but PPC64LE is LE EABI 2.0 i
don't see how that's workable.

unless you create a new triplet PPC64-LE-using-EABI-1.5

also: multilib and it is being ripped out from distros.

> c) put each library in a different directory
> d) at run time, select the path to load the libraries from CPU
>capabilities

this is multiarch i believe.  it requires, as i recall, a
syscall-level understanding of the two ABIs.  with ppc64 being BE and
ppc64le being LE this would require word-order swapping at the syscall
level.

l.



Re: enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-03-01 Thread Gabriel Paubert
On Sun, Feb 28, 2021 at 11:52:12PM +, Luke Kenneth Casson Leighton wrote:
> On Monday, March 1, 2021, Riccardo Mottola 
> wrote:
> 
> > A quick solution would to have this configure as a convenience, but have
> a way to pass an --enable-altivec and -disable-altivec (or with/without?)
> parameter to configure.
> 
> EABI v2.0 rather unfortunately, despite it being optional in the OpenPOWER
> Compliancy Suite, made SIMD mandatory.
> 
> EABI v1.5 does not require SIMD.
> 
> the problem is that the assumption "#ifdef POWER9" is bleeding through to
> many code repositories.
> 
> Tulio Magno Quites Machado Filho is currently working on glibc6 patches
> which reverse these erroneous assumptions, replacing them with "#ifdef VSX"
> thus allowing people to compile code that does not rely on SIMD.

Beware that VSX is not Altivec. Altivec was called VMX by IBM and
VSX is a superset of Altivec (IIRC).

G4 and G5 do not have VSX. 

> 
> unfortunately it is somewhat a lost cause because of the mistake made in
> EABI v2.  modifying EABI v2 to make SIMD optional is no longer possible
> because it would break backwards compatibility, the only option being to
> create a new triplet, then an entire new distro port, and that is a 3 to 5
> year process.
> 
> an alternative solution is to have a kernel-level emulator of SIMD
> instructions.
> https://bugs.libre-soc.org/show_bug.cgi?id=602

This is going to be hopelessly slow. The point of SIMD is to process
quickly vast amounts of data, the overhead of every single emulated
instructions is counted in hundreds of cycles.

IMHO, the only solution is to:
a) only use SIMD in library code
b) compile 2 or 3 versions of libraries: no SIMD, VMX and/or VSX
c) put each library in a different directory
d) at run time, select the path to load the libraries from CPU
   capabilities

There is a precedent for this in the x86 world, where there were i386
and i686 directories to support the PPro. It is still the case on the
machines where I have to install 32 bit libraries:

$ locate libx264.so
/usr/lib/i386-linux-gnu/i686/sse2/libx264.so.155
/usr/lib/i386-linux-gnu/libx264.so.155
/usr/lib/x86_64-linux-gnu/libx264.so.155

There are two 32 bit versions of the libx264, one for old processors and
one for processors with sse2.

Regards,
Gabriel


> 
> fascinatingly there is precedent for this in the form of sstep.c which
> triggers from illegal instruction trap and emulates some parts of the
> OpenPOWER ISA.
> 
> l.
> 
> 
> -- 
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
 



enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-02-28 Thread Luke Kenneth Casson Leighton
On Monday, March 1, 2021, Riccardo Mottola 
wrote:

> A quick solution would to have this configure as a convenience, but have
a way to pass an --enable-altivec and -disable-altivec (or with/without?)
parameter to configure.

EABI v2.0 rather unfortunately, despite it being optional in the OpenPOWER
Compliancy Suite, made SIMD mandatory.

EABI v1.5 does not require SIMD.

the problem is that the assumption "#ifdef POWER9" is bleeding through to
many code repositories.

Tulio Magno Quites Machado Filho is currently working on glibc6 patches
which reverse these erroneous assumptions, replacing them with "#ifdef VSX"
thus allowing people to compile code that does not rely on SIMD.

unfortunately it is somewhat a lost cause because of the mistake made in
EABI v2.  modifying EABI v2 to make SIMD optional is no longer possible
because it would break backwards compatibility, the only option being to
create a new triplet, then an entire new distro port, and that is a 3 to 5
year process.

an alternative solution is to have a kernel-level emulator of SIMD
instructions.
https://bugs.libre-soc.org/show_bug.cgi?id=602

fascinatingly there is precedent for this in the form of sstep.c which
triggers from illegal instruction trap and emulates some parts of the
OpenPOWER ISA.

l.


-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


enabling/disabling AltiVec in Firefox and derived browsers (ArcticFox)

2021-02-28 Thread Riccardo Mottola

Hello,

I was checkin gout a specific patch Adrian made for Firefox, which 
should help building on non-Altivec capable CPUs.

https://github.com/mozilla/gecko-dev/commit/c6b39f0f902898988ca7793af56307640ff81362

I have imported it in ArcticFox with this commit and tested it.
https://github.com/rmottola/Arctic-Fox/commit/1e3eb367dcfd6c9f61c738443b7967aa5fd7dae9

This configure tests relies on the fact that the compiler will throw 
an error if -maltivec is used and not supported.


This causes a big issue if you want to compile a non-Altivec build on 
a capable processor like the G4: it will automatically enabled even if 
you don't want.
E.g. if I want to build on a G4 a binary working on the G3, I can't. I 
specify -mcpu=750 -mtune=750, but the compiler will accept -maltivec 
and create an incompatible binary.


A quick solution would to have this configure as a convenience, but 
have a way to pass an --enable-altivec and -disable-altivec (or 
with/without?) parameter to configure.


I am not even sure if it is still true that the compiler will reject 
the option, I will test on my G3.


Another proposal would be to parse the CFLAGS and check if -maltivec 
was specified and thus enable HAVE_ALTIVEC



Other proposals? Let's discuss.


Riccardo