On 08/19/2016 02:20 PM, C Bergström wrote:
Sorry to be the party crasher, but...

I'd love to have optimizations for everything out there, but it takes
a lot of work to fine tune for something specific.

Agreed. Right now on Armv8 alone, there are dozens of teams working on the identical concepts presented in this thread. Most are also targeting specific domains. At some point there with pathways, just like in Computational Chemistry, where the optimization pathway for new silicon is fast and previous work helps tremendously. That is, you are not alone in your quests, far, far from it.


Right now I see a few variants of ARMv8
------------
ARM reference stuff - A57 cores and the newer bits.. The scheduling
and stuff seems more-or-less similar enough that one tuning could
probably work for the vast majority of these parts.

Cavium ThunderX - It's ground up and quite different from the ARM
reference stuff under the hood

APM - Mustang, again ground up and different. I don't have enough
hands on to know how different from reference.

Broadcom - Coming Soon(tm) - Again no hands on or any data, but
certainly very interesting..

... now add in every variant of ground up implementation and you have
50 shades of gray..

And billions of dollars financing those efforts in parallel. It's an arms race, (like the pun?). Wonder why a Japanese conglomerate offered to purchase ARM ltd. for such a large figure? Wonder why intel has arm licenses now? Your group might only be able to focus on a few ARM offerings, but there are dozens and dozens of ARM teams alone that would dispute your arithmetic above.

-------------
Soo.. depending on your target hardware, you may be better off with
gcc if the end goal is general all-around performance. (It does a
quite respectable job of being generic) I realize a lot of people have
strong feelings for or against it. I leave that to the reader to
decide..

You misconstrue concepts. Nobody, especially me, implies that one pathway (to a Unikernel [1] if you like) suites all near-optimized solutions. That would be pointless. What you allude to, already exists in some of the more progressive data/cloud vendor clouds. We are talking about a unikernel for different classes of problems, across arm8 and x86-64 and GPU architectures, not thousands of (arch) processor variants. However, those other processor (arch) variants and the folks that earn a living off of those variants, are not sitting back idle, either.


Back to my own glass house.. It will take a few years, but I am trying
to make it easier (internally) to expose in some clear way all the
pieces which compose a fine tuning per-processor. If this was "just"
scheduling models it would be really easy, but it's not.. Those
latencies and other magic bits decide things like.. "should I unroll
this loop or do something else" and then you venture into the land of
accelerators where a custom regalloc may be what you really need and
*nothing* off the shelf fits to meet your goals.. (projects like that
can take 9 months and in the end only give a general 1-5% median
performance gain..)

If this is your mantra, I resend the generous comments. Cray use to work that way, milking the Petroleum Industry for tons of money, but, things have changed and the change is accelerating, rapidly. Perhaps too much off those Cray patents that your company owns are leaking toxins into the brain-trust where you park?

Vendor walk-back is sad, imho. ymmv.

Best of luck to your company's  5-year plan....


[2] http://unikernel.org/

hth,
James


--------------


On Sat, Aug 20, 2016 at 2:02 AM, james <gar...@verizon.net> wrote:
On 08/19/2016 11:15 AM, C Bergström wrote:

On Fri, Aug 19, 2016 at 11:01 PM, Luca Barbato <lu_z...@gentoo.org> wrote:

BTW is pathscale ready to be used as system compiler as well?


I wish, but no. We have known issues when building grub2, glibc and
the Linux kernel at the very least. Someone* did report a long time
ago that with their unofficial port, were able to build/boot the
NetBSD kernel.
(*A community dev we trusted with our sources and was helping us with
portability across platforms)

The stuff with grub2 may potentially be fixed in the "near" future...
the others are more tricky. In general if clang can do it, we have a
strong chance as well.

As a philosophy - "we" aren't really trying to be the best generic
compiler in the world. We aim more on optimizing as much for known
targets. So if by system you mean, a compiler that would produce an
"OS" which only runs on a single class of hardware, then yeah it could
work at some point in the future. Specifically, on x86 we default on
host CPU optimizations. So on newer Intel hardware it's easy to get a
binary that won't run on AMD or older 64bit Intel.

More recently on ARMv8 - we turn on processor specific tuning. So
while it may "run", the difference between APM's mustang and Cavium
ThunderX is pretty big and running binaries intended for A and ran on
B would certainly take a hit.. (this is just the tip of the iceberg)

For general scalar OS code it isn't likely to matter... the real
impact being like 1-10% difference (being very general.. it could be
less or more in the real world..)

For HPC codes or anything where you get loops or computationally
complex - the gloves are off and I could see big differences... (again
being general and maybe a bit dramatic for fun)



OK (actually fantastic!). Looking at the pathscale site pages and github,
perhaps a cheap arm embedded board where llvm is the centerpiece of
compiling a minimal system to entice gentoo-llvm testers, would be possible
in the near future?. I have a 96boards, HiKey arm64v8  that I could dedicate
to gentoo+armv8-llvm testing, if that'd help. [1]

Perhaps a  baseline bootstrap iso (or such) version  targeted at
llvm-centric testers on x86-64 or armv8 ? Skip grub2 and use grub-legacy or
lilo or (?), since there seems to be issues with llvm-grub2.


[1] http://dev.gentoo.org/~tgall/


No matter how you slice it, from someone who is focused on building
minimized and embedded (bare metal) systems that are customized and
coalesced into a heterogeneous gentoo cluster for HPC, this is wonderful
news. Finally a vendor in the cluster space, with some vision and
common-sense, imho. Heterogeneous and open  HPC is where is at, imho. If
there is a forum where the community and pathscale folks discuss issues,
point that out as I could not find one for deeper reading....


hth,
James





Reply via email to