gcc-6-20170316 is now available

2017-03-16 Thread gccadmin
Snapshot gcc-6-20170316 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/6-20170316/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-6-branch 
revision 246212

You'll find:

 gcc-6-20170316.tar.bz2   Complete GCC

  SHA256=64a7e07bb163df01713c19526b34696d55966bfff3d6f6362edae2f17e4937cf
  SHA1=e5c5391596e97ccad9bc8ee6241f95662041539c

Diffs from 6-20170309 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Obsolete powerpc*-*-*spe*

2017-03-16 Thread Andrew Jenner

On 16/03/2017 21:11, Segher Boessenkool wrote:

The e200z3 upwards have SPE units. None of them have classic FP. So it
would make most sense for the e200/VLE support to be part of the SPE
backend rather than the classic PowerPC backend.


Great to hear!  And all e300 are purely "classic"?


That's one I'm less familiar with (as we don't deliver a multilib for 
it), but yes - my understanding is that this is classic core.


Andrew


Re: Obsolete powerpc*-*-*spe*

2017-03-16 Thread Segher Boessenkool
On Thu, Mar 16, 2017 at 08:38:37PM +, Andrew Jenner wrote:
> >>Are you proposing to take on the task of actually splitting it yourself?
> >>If so, that would make me a lot happier about it.
> >
> >Yes, I can do the mechanics.  But I cannot do most of the testing.
> 
> That's fine (and what I expected).
> 
> >And
> >this does not include any of the huge simplifications that can be done
> >after the split: both ports will be very close to what we have now,
> >immediately after the split.
> 
> I'd have thought that the simplifications would be the bulk of the 
> work...

The simplifications are not necessary to make things work.  They can
all be done piecemeal, and later (we should do the split during early
stage1 if possible).

I cannot promise you much of IBM's time (or my own abundant spare time),
but we will of course be available for advice and questions etc.

It is not like removing 20k or 30k lines is as much work as writing
them, of course ;-)

> The simplification of the classic PowerPC port would be the 
> removal of the SPE code. What would be removed from the SPE port - 
> anything other than Altivec and 64-bit?

Don't forget the other vector stuff, VSX.  It is not small or simple.

> >>All the e200 cores apart from e200z0 can execute 32-bit instructions as
> >>well as VLE, though we'll always generate VLE code when targetting them
> >>(otherwise they're fairly standard).
> >
> >Do any e200 support SPE, or classic FP?
> 
> The e200z3 upwards have SPE units. None of them have classic FP. So it 
> would make most sense for the e200/VLE support to be part of the SPE 
> backend rather than the classic PowerPC backend.

Great to hear!  And all e300 are purely "classic"?


Segher


Re: Obsolete powerpc*-*-*spe*

2017-03-16 Thread Andrew Jenner

Hi Segher,

On 16/03/2017 19:24, Segher Boessenkool wrote:

e500mc (like e5500, e6500) are just PowerPC (and they use the usual ABIs),
so those should stay on the "rs6000 side".


Agreed.


Are you proposing to take on the task of actually splitting it yourself?
If so, that would make me a lot happier about it.


Yes, I can do the mechanics.  But I cannot do most of the testing.


That's fine (and what I expected).


And
this does not include any of the huge simplifications that can be done
after the split: both ports will be very close to what we have now,
immediately after the split.


I'd have thought that the simplifications would be the bulk of the 
work... The simplification of the classic PowerPC port would be the 
removal of the SPE code. What would be removed from the SPE port - 
anything other than Altivec and 64-bit?



All the e200 cores apart from e200z0 can execute 32-bit instructions as
well as VLE, though we'll always generate VLE code when targetting them
(otherwise they're fairly standard).


Do any e200 support SPE, or classic FP?


The e200z3 upwards have SPE units. None of them have classic FP. So it 
would make most sense for the e200/VLE support to be part of the SPE 
backend rather than the classic PowerPC backend.


Andrew


Re: Obsolete powerpc*-*-*spe*

2017-03-16 Thread Segher Boessenkool
Hi Andrew,

On Wed, Mar 15, 2017 at 09:43:20PM +, Andrew Jenner wrote:
> On 15/03/2017 14:26, Segher Boessenkool wrote:
> >I do not think VLE can get in, not in its current shape at least.
> 
> That's unfortunate. Disregarding the SPE splitting plan for a moment, 
> what do you think would need to be done to get it into shape? I had 
> thought we were almost there with the patches that I sent to you and 
> David off-list last year.
> 
> > VLE
> >is very unlike PowerPC in many ways so it comes at a very big cost to
> >the port (maintenance and otherwise -- maintenance is what I care about
> >most).
> 
> I completely understand.

That answers your previous question, too.

> >Since SPE and VLE only share the part of the rs6000 port that doesn't
> >change at all (except for a bug fix once or twice a year), and everything
> >else needs special cases all over the place, it seems to me it would be
> >best for everyone if we split the rs6000 port in two, one for SPE and VLE
> >and one for the rest.  Both ports could then be very significantly
> >simplified.
> >
> >I am assuming SPE and VLE do not support AltiVec or 64-bit PowerPC,
> >please correct me if that is incorrect.  Also, is "normal" floating
> >point supported at all?
> 
> My understanding is that SPE is only present in the e500v1, e500v2 and 
> e200z[3-7] cores, all of which are 32-bit only and do not have classic 
> floating-point units. SPE and Altivec cannot coexist as they have some 
> overlapping instruction encodings. The successor to e500v2 (e500mc) 
> reinstated classic floating-point and got rid of SPE.

e500mc (like e5500, e6500) are just PowerPC (and they use the usual ABIs),
so those should stay on the "rs6000 side".

> >Do you (AdaCore and Mentor) think splitting the port is a good idea?
> 
> It wouldn't have been my preference, but I can understand the appeal of 
> that plan for you. I'm surprised that the amount of shared code between 
> SPE and PowerPC is as little as you say, but you have much more 
> experience with the PowerPC port than I do, so I'll defer to your 
> expertise on that matter.
> 
> Are you proposing to take on the task of actually splitting it yourself? 
> If so, that would make me a lot happier about it.

Yes, I can do the mechanics.  But I cannot do most of the testing.  And
this does not include any of the huge simplifications that can be done
after the split: both ports will be very close to what we have now,
immediately after the split.

> >> -te200z0
> >> -te200z3
> >> -te200z4
> >
> > These are VLE?
> 
> Yes.
> 
> > Do some of those also support PowerPC?
> 
> All the e200 cores apart from e200z0 can execute 32-bit instructions as 
> well as VLE, though we'll always generate VLE code when targetting them 
> (otherwise they're fairly standard).

Do any e200 support SPE, or classic FP?


Segher


GCN back-end branch

2017-03-16 Thread Martin Jambor
Hello,

after working on GCN back-end in private branch, we would like to make
it public and invite the community to have a look, comment, review or
even contribute.  Therefore we have just pushed the current state of
the back-end to the git branch gcn (see
https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/gcn or
fetch it as any other git branch).

We have decided not to have ChangeLog.gcn files but if you wish to
contribute, please make standard changelog entries part of commit
messages.  Additionally, the basic git collaboration rules should
apply, most notably make sure you do not do non-fast-forward pushes to
the branch, start your commit messages with one-line brief summaries
and so forth.  Any patches against the branch should be sent to
gcc-patches, and while I think that full-blown reviews are not
necessary at this stage, please coordinate with me and Honza before
you commit anything.  I will be making regular merges from trunk.

At this point, the back-end can compile small kernels open-coded in C
with target-specific attributes, built-ins and address spaces to make
use of the various special characteristics of the architecture.
Eventually, it should of course provide for high-level programming
models, most notably OpenMP, but the list of steps we need to take
before we get there is very long.

The changelog of the branch initial commit is below.  Apart from a new
machine description, it also contains a few modifications to the
compiler proper, most of which are needed to increase the limit on
size of scalar types and the number of arguments of an instruction
(which are actually not strictly necessary now but we have bumped into
them during development).  We plan to commit generally useful generic
changes early in stage1.

So far we have tested output of the branch only on AMD APUs, we have
not tested on discrete GPUs yet.  To run the kernels, you need quite a
few more pieces in your software stack in addition to our branch and
the hardware.  Most notably, you currently need:

  1) an AMDGPU-LLVM-based assembler,
  2) the amdphdrs utility from
https://github.com/RadeonOpenCompute/LLVM-AMDGPU-Assembler-Extra, and
  3,4,5) ROCK kernel, ROCT thunk interface library and ROCR run time
library, which you can get from
https://github.com/RadeonOpenCompute (or currently from

http://download.opensuse.org/repositories/home:/jamborm:/roc-1.3/openSUSE_Tumbleweed
if you use openSUSE Tumbleweed, so far I have packaged only
version 1.3 but so far it was sufficient).

The work-flow is that you configure the branch with
--target=amdgcn-unknown-amdhsa, use it to compile the kernel into
assembly, which you then feed to llvm-mc amdgcn assembler, we then use
amdphdrs tool to convert the resultant object file to an AMD HSA "code
object" which the ROCR run time can then load and execute.  Honza and
I hope to come up with an article demonstrating what can already be
done with the branch soon, but that is clearly out of scope of this
already too long announcement.  We plan to write a wiki page with some
examples and more detailed descriptions of some basic problems with
modeling GCN in GCC.

Thus, let me conclude saying that I'm looking forward to taking on
many challenges this architecture will present for GCC and I would
like to invite everyone interested to help tackling them,

Martin



2017-03-10  Jan Hubicka  
Martin Jambor  

* config.sub: Added amdgcn cases.

gcc/
* common/config/gcn/gcn-common.c: New file.
* config/gcn/constraints.md: Likewise.
* config/gcn/gcn-builtins.def: Likewise.
* config/gcn/gcn-c.c: Likewise.
* config/gcn/gcn-hsa.h: Likewise.
* config/gcn/gcn-modes.def: Likewise.
* config/gcn/gcn-protos.h: Likewise.
* config/gcn/gcn-valu.md: Likewise.
* config/gcn/gcn.c: Likewise.
* config/gcn/gcn.h: Likewise.
* config/gcn/gcn.md: Likewise.
* config/gcn/gcn.opt: Likewise.
* config/gcn/predicates.md: Likewise.
* config/gcn/t-gcn-elf: Likewise.
* ira.c (ira_init_register_move_cost): Also check that
contains_reg_of_mode.
* combine.c (gen_lowpart_or_truncate): Return clobber if there is
not a integer mode if the same size as x.
(gen_lowpart_for_combine): Fail if there is no integer mode of the
same size.
* config.gcc: Added amdgcn cases.
* emit-rtl.c (get_mem_align_offset): Return zero for overaligned
memory.
* explow.c (memory_address_addr_space): Call memory_address_addr_space
if a representation by a single register is invalid.
* expr.c (expand_expr_real_1): disable converting operand to fields or
BLK mode.
* ira-costs.c (setup_allocno_class_and_costs): Do not assert that
cost_classes_ptr->hard_regno_index is non-negative.
* lra-constraints.c (process_alt_operands): Do not penalize constnats.
(curr_insn_transfor

Re: [RFC] Support register groups in inline asm

2017-03-16 Thread Andrew Senkevich
2017-03-16 9:50 GMT+01:00 Richard Biener :
> On Wed, 15 Mar 2017, Andrew Senkevich wrote:
>
>> 2016-12-05 16:31 GMT+01:00 Andrew Senkevich :
>> > 2016-11-16 8:02 GMT+03:00 Andrew Pinski :
>> >> On Tue, Nov 15, 2016 at 9:36 AM, Andrew Senkevich
>> >>  wrote:
>> >>> Hi,
>> >>>
>> >>> new Intel instructions AVX512_4FMAPS and AVX512_4VNNIW introduce use
>> >>> of register groups.
>> >>>
>> >>> To support register groups feature in inline asm needed some extension
>> >>> with new constraints.
>> >>>
>> >>> Current proposal is the following syntax:
>> >>>
>> >>> __asm__ (“SMTH %[group], %[single]" :
>> >>> [single] 
>> >>> "+x"(v0) :
>> >>> [group]
>> >>> "Yg4"(v1),  “1+1"(v2), “1+2"(v3), “1+3"(v4));
>> >>>
>> >>> where "YgN" constraint specifies group of N consecutive registers
>> >>> (which is started from register having number as "0 mod
>> >>> 2^ceil(log2(N))"),
>> >>> and "1+K" specifies the next registers in the group.
>> >>>
>> >>> Is this syntax ok? How to implement it?
>> >>
>> >>
>> >> Have you looked into how AARCH64 back-end handles this via OI, etc.
>> >> Like:
>> >> /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments.  */
>> >> INT_MODE (OI, 32);
>> >>
>> >> /* Opaque integer modes for 3 or 4 Neon q-registers / 6 or 8 Neon 
>> >> d-registers
>> >>(2 d-regs = 1 q-reg = TImode).  */
>> >> INT_MODE (CI, 48);
>> >> INT_MODE (XI, 64);
>> >>
>> >>
>> >> And then it implements TARGET_ARRAY_MODE_SUPPORTED_P. target hook?
>> >> And the x2 types are defined as a struct of an array like:
>> >> typedef struct int8x8x2_t
>> >> {
>> >>   int8x8_t val[2];
>> >> } int8x8x2_t;
>> >
>> > Thanks!
>> >
>> > We have to update proposal with changing "+" symbol to "#" specifying
>> > offset in a group (to avoid overloading the other meaning of “+”
>> > specifying that operand is both input and output).
>> >
>> > So current proposal of syntax is:
>> >
>> > __asm__ (“INSTR %[group], %[single]" :
>> > [single] 
>> > "+x"(v0) :
>> > [group]
>> > "Yg4"(v1),  “1#1"(v2), “1#2"(v3), “1#3"(v4));
>> >
>> > where "YgN" constraint specifies group of N consecutive registers
>> > (which is started from register having number as "0 mod 2^ceil(log2(N))"),
>> > and "1#K" specifies the next registers in the group.
>> >
>> > Some other questions or comments?
>> >
>> > What about consensus on this syntax?
>>
>> Hi Richard!
>>
>> Can we have agreement on this syntax, what do you think?
>
> I have no expertise / opinion here.

Hi Jeff, are you proper person to ask?


--
WBR,
Andrew


Re: [RFC] Support register groups in inline asm

2017-03-16 Thread Richard Biener
On Wed, 15 Mar 2017, Andrew Senkevich wrote:

> 2016-12-05 16:31 GMT+01:00 Andrew Senkevich :
> > 2016-11-16 8:02 GMT+03:00 Andrew Pinski :
> >> On Tue, Nov 15, 2016 at 9:36 AM, Andrew Senkevich
> >>  wrote:
> >>> Hi,
> >>>
> >>> new Intel instructions AVX512_4FMAPS and AVX512_4VNNIW introduce use
> >>> of register groups.
> >>>
> >>> To support register groups feature in inline asm needed some extension
> >>> with new constraints.
> >>>
> >>> Current proposal is the following syntax:
> >>>
> >>> __asm__ (“SMTH %[group], %[single]" :
> >>> [single] 
> >>> "+x"(v0) :
> >>> [group]
> >>> "Yg4"(v1),  “1+1"(v2), “1+2"(v3), “1+3"(v4));
> >>>
> >>> where "YgN" constraint specifies group of N consecutive registers
> >>> (which is started from register having number as "0 mod
> >>> 2^ceil(log2(N))"),
> >>> and "1+K" specifies the next registers in the group.
> >>>
> >>> Is this syntax ok? How to implement it?
> >>
> >>
> >> Have you looked into how AARCH64 back-end handles this via OI, etc.
> >> Like:
> >> /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments.  */
> >> INT_MODE (OI, 32);
> >>
> >> /* Opaque integer modes for 3 or 4 Neon q-registers / 6 or 8 Neon 
> >> d-registers
> >>(2 d-regs = 1 q-reg = TImode).  */
> >> INT_MODE (CI, 48);
> >> INT_MODE (XI, 64);
> >>
> >>
> >> And then it implements TARGET_ARRAY_MODE_SUPPORTED_P. target hook?
> >> And the x2 types are defined as a struct of an array like:
> >> typedef struct int8x8x2_t
> >> {
> >>   int8x8_t val[2];
> >> } int8x8x2_t;
> >
> > Thanks!
> >
> > We have to update proposal with changing "+" symbol to "#" specifying
> > offset in a group (to avoid overloading the other meaning of “+”
> > specifying that operand is both input and output).
> >
> > So current proposal of syntax is:
> >
> > __asm__ (“INSTR %[group], %[single]" :
> > [single] 
> > "+x"(v0) :
> > [group]
> > "Yg4"(v1),  “1#1"(v2), “1#2"(v3), “1#3"(v4));
> >
> > where "YgN" constraint specifies group of N consecutive registers
> > (which is started from register having number as "0 mod 2^ceil(log2(N))"),
> > and "1#K" specifies the next registers in the group.
> >
> > Some other questions or comments?
> >
> > What about consensus on this syntax?
> 
> Hi Richard!
> 
> Can we have agreement on this syntax, what do you think?

I have no expertise / opinion here.

Richard.

> 
> --
> WBR,
> Andrew
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)