[RFC][ARM] Naming for new switch to check for mixed hardfloat/softfloat compat

2014-03-03 Thread Thomas Preudhomme

[Please CC me as I'm not subscribed to this list]

Hi there,

I'm currently working on adding a switch to check whether public 
function involve float parameters or return values. Such a check would 
be useful for people trying to write code that is compatible with both 
base standard (softfloat) and standard variant (hardfloat) ARM calling 
convention. I also intend to set the ELF attribute Tag_ABI_VFP_args to 
value 3 (code compatible with both ABI) so this check would allow to 
make sure such value would be set.


I initially thought about reusing -mfloat-abi with the value none for 
that purpose since it would somehow define a new ABI where no float can 
be used. However, it would then not be possible to forbit float in 
public interface with the use of VFP instructions for float arithmetic 
(softfp) because this switch conflates the float ABI with the use of a 
floating point unit for float arithmetic. Also, gcc passes -mfloat-abi 
down to the assembler and that would mean teaching the assembler about 
-mfloat-abi=none as well.


I thus think that a new switch would be better and am asking for your 
opinion about it as I would like this functionality to incorporate gcc 
codebase.


Best regards,

Thomas Preud'homme


linux says it is a bug

2014-03-03 Thread lin zuojian
Hi,
in include/linux/compiler-gcc.h :

/* Optimization barrier */
/* The "volatile" is due to gcc bugs */
#define barrier() __asm__ __volatile__("": : :"memory")

The comment of Linux says this is a gcc bug.But will any sane compiler
disable optimization without "volatile" key word?

--
Regards
lin zuojian



RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking

2014-03-03 Thread Matthew Fortune
> > Sorry, forgot about that.  In that case maybe program headers would be
> > best, like you say.  I.e. we could use a combination of GNU attributes
> > and a new program header, with the program header hopefully being more
> > general than for just this case.  I suppose this comes back to the
> > thread from binutils@ last year about how to manage the dwindling
> > number of free flags:
> >
> > https://www.sourceware.org/ml/binutils/2013-09/msg00039.html
> >  to https://www.sourceware.org/ml/binutils/2013-09/msg00099.html
> >

There are a couple of issues to resolve in order to use gnu attributes to 
record FP requirements at the module level. As it currently stands gnu 
attributes are controlled via the .gnu_attribute directive and these are 
emitted explicitly by the compiler. I think it is important that a more 
meaningful directive is available but it will need to interact nicely with the 
.gnu_attribute as well.

The first problem is that there will be new ways to influence whether a gnu 
attribute is emitted or not. i.e. the command line options -mfp32, -mfpxx, 
-mfp64 will infer the relevant attribute Tag_GNU_MIPS_ABI_FP and if the .module 
directive is present then that would override it. Will there be any problems 
with a new ways to generate a gnu attribute?

The second problem is that in order to support relaxing a mode requirement then 
any up-front directive/command line option that sets a specific fp32/fp64 
requirement needs to be updated to fpxx. With gnu attributes this would mean 
updating an existing Tag_GNU_MIPS_ABI_FP setting to be modeless.

I don't think any other port does this kind of thing in binutils but that 
doesn't mean we can't I guess.

Regards,
Matthew


Re: [AVR] remove two maintainers

2014-03-03 Thread David Brown

On 03/03/14 21:30, Eric Weddington wrote:

I just replied to Denis personally.

Agreed. I haven't done anything with the AVR port in a while, and I
probably won't be doing so for a while. Maybe some time in the future.
So it makes sense to remove me from maintainership.

Eric Weddington


On behalf of all avr gcc users, including the "winavr" project, let me 
offer you a big "thank you" for all your work on the port, and wish you 
luck in your life after Atmel.


mvh.,

David




Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-03 Thread Torvald Riegel
On Mon, 2014-03-03 at 11:20 -0800, Paul E. McKenney wrote:
> On Mon, Mar 03, 2014 at 07:55:08PM +0100, Torvald Riegel wrote:
> > xagsmtp2.20140303190831.9...@uk1vsc.vnet.ibm.com
> > X-Xagent-Gateway: uk1vsc.vnet.ibm.com (XAGSMTP2 at UK1VSC)
> > 
> > On Fri, 2014-02-28 at 16:50 -0800, Paul E. McKenney wrote:
> > > +oDo not use the results from the boolean "&&" and "||" when
> > > + dereferencing.  For example, the following (rather improbable)
> > > + code is buggy:
> > > +
> > > + int a[2];
> > > + int index;
> > > + int force_zero_index = 1;
> > > +
> > > + ...
> > > +
> > > + r1 = rcu_dereference(i1)
> > > + r2 = a[r1 && force_zero_index];  /* BUGGY!!! */
> > > +
> > > + The reason this is buggy is that "&&" and "||" are often compiled
> > > + using branches.  While weak-memory machines such as ARM or PowerPC
> > > + do order stores after such branches, they can speculate loads,
> > > + which can result in misordering bugs.
> > > +
> > > +oDo not use the results from relational operators ("==", "!=",
> > > + ">", ">=", "<", or "<=") when dereferencing.  For example,
> > > + the following (quite strange) code is buggy:
> > > +
> > > + int a[2];
> > > + int index;
> > > + int flip_index = 0;
> > > +
> > > + ...
> > > +
> > > + r1 = rcu_dereference(i1)
> > > + r2 = a[r1 != flip_index];  /* BUGGY!!! */
> > > +
> > > + As before, the reason this is buggy is that relational operators
> > > + are often compiled using branches.  And as before, although
> > > + weak-memory machines such as ARM or PowerPC do order stores
> > > + after such branches, but can speculate loads, which can again
> > > + result in misordering bugs.
> > 
> > Those two would be allowed by the wording I have recently proposed,
> > AFAICS.  r1 != flip_index would result in two possible values (unless
> > there are further constraints due to the type of r1 and the values that
> > flip_index can have).
> 
> And I am OK with the value_dep_preserving type providing more/better
> guarantees than we get by default from current compilers.
> 
> One question, though.  Suppose that the code did not want a value
> dependency to be tracked through a comparison operator.  What does
> the developer do in that case?  (The reason I ask is that I have
> not yet found a use case in the Linux kernel that expects a value
> dependency to be tracked through a comparison.)

Hmm.  I suppose use an explicit cast to non-vdp before or after the
comparison?



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-03 Thread Torvald Riegel
On Sun, 2014-03-02 at 04:05 -0600, Peter Sewell wrote:
> On 1 March 2014 08:03, Paul E. McKenney  wrote:
> > On Sat, Mar 01, 2014 at 04:06:34AM -0600, Peter Sewell wrote:
> >> Hi Paul,
> >>
> >> On 28 February 2014 18:50, Paul E. McKenney  
> >> wrote:
> >> > On Thu, Feb 27, 2014 at 12:53:12PM -0800, Paul E. McKenney wrote:
> >> >> On Thu, Feb 27, 2014 at 11:47:08AM -0800, Linus Torvalds wrote:
> >> >> > On Thu, Feb 27, 2014 at 11:06 AM, Paul E. McKenney
> >> >> >  wrote:
> >> >> > >
> >> >> > > 3.  The comparison was against another RCU-protected pointer,
> >> >> > > where that other pointer was properly fetched using one
> >> >> > > of the RCU primitives.  Here it doesn't matter which pointer
> >> >> > > you use.  At least as long as the rcu_assign_pointer() for
> >> >> > > that other pointer happened after the last update to the
> >> >> > > pointed-to structure.
> >> >> > >
> >> >> > > I am a bit nervous about #3.  Any thoughts on it?
> >> >> >
> >> >> > I think that it might be worth pointing out as an example, and saying
> >> >> > that code like
> >> >> >
> >> >> >p = atomic_read(consume);
> >> >> >X;
> >> >> >q = atomic_read(consume);
> >> >> >Y;
> >> >> >if (p == q)
> >> >> > data = p->val;
> >> >> >
> >> >> > then the access of "p->val" is constrained to be data-dependent on
> >> >> > *either* p or q, but you can't really tell which, since the compiler
> >> >> > can decide that the values are interchangeable.
> >> >> >
> >> >> > I cannot for the life of me come up with a situation where this would
> >> >> > matter, though. If "X" contains a fence, then that fence will be a
> >> >> > stronger ordering than anything the consume through "p" would
> >> >> > guarantee anyway. And if "X" does *not* contain a fence, then the
> >> >> > atomic reads of p and q are unordered *anyway*, so then whether the
> >> >> > ordering to the access through "p" is through p or q is kind of
> >> >> > irrelevant. No?
> >> >>
> >> >> I can make a contrived litmus test for it, but you are right, the only
> >> >> time you can see it happen is when X has no barriers, in which case
> >> >> you don't have any ordering anyway -- both the compiler and the CPU can
> >> >> reorder the loads into p and q, and the read from p->val can, as you 
> >> >> say,
> >> >> come from either pointer.
> >> >>
> >> >> For whatever it is worth, hear is the litmus test:
> >> >>
> >> >> T1:   p = kmalloc(...);
> >> >>   if (p == NULL)
> >> >>   deal_with_it();
> >> >>   p->a = 42;  /* Each field in its own cache line. */
> >> >>   p->b = 43;
> >> >>   p->c = 44;
> >> >>   atomic_store_explicit(&gp1, p, memory_order_release);
> >> >>   p->b = 143;
> >> >>   p->c = 144;
> >> >>   atomic_store_explicit(&gp2, p, memory_order_release);
> >> >>
> >> >> T2:   p = atomic_load_explicit(&gp2, memory_order_consume);
> >> >>   r1 = p->b;  /* Guaranteed to get 143. */
> >> >>   q = atomic_load_explicit(&gp1, memory_order_consume);
> >> >>   if (p == q) {
> >> >>   /* The compiler decides that q->c is same as p->c. */
> >> >>   r2 = p->c; /* Could get 44 on weakly order system. */
> >> >>   }
> >> >>
> >> >> The loads from gp1 and gp2 are, as you say, unordered, so you get what
> >> >> you get.
> >> >>
> >> >> And publishing a structure via one RCU-protected pointer, updating it,
> >> >> then publishing it via another pointer seems to me to be asking for
> >> >> trouble anyway.  If you really want to do something like that and still
> >> >> see consistency across all the fields in the structure, please put a 
> >> >> lock
> >> >> in the structure and use it to guard updates and accesses to those 
> >> >> fields.
> >> >
> >> > And here is a patch documenting the restrictions for the current Linux
> >> > kernel.  The rules change a bit due to rcu_dereference() acting a bit
> >> > differently than atomic_load_explicit(&p, memory_order_consume).
> >> >
> >> > Thoughts?
> >>
> >> That might serve as informal documentation for linux kernel
> >> programmers about the bounds on the optimisations that you expect
> >> compilers to do for common-case RCU code - and I guess that's what you
> >> intend it to be for.   But I don't see how one can make it precise
> >> enough to serve as a language definition, so that compiler people
> >> could confidently say "yes, we respect that", which I guess is what
> >> you really need.  As a useful criterion, we should aim for something
> >> precise enough that in a verified-compiler context you can
> >> mathematically prove that the compiler will satisfy it  (even though
> >> that won't happen anytime soon for GCC), and that analysis tool
> >> authors can actually know what they're working with.   All this stuff
> >> about "you should avoid cancellation", and "avoid masking with just a
> >> small number of bits" is just too vague.
> >
> > Understood, and yes, this is intended to docum

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-03 Thread Torvald Riegel
On Thu, 2014-02-27 at 17:02 -0800, Paul E. McKenney wrote:
> On Thu, Feb 27, 2014 at 09:50:21AM -0800, Paul E. McKenney wrote:
> > On Thu, Feb 27, 2014 at 04:37:33PM +0100, Torvald Riegel wrote:
> > > xagsmtp2.20140227154925.3...@vmsdvm9.vnet.ibm.com
> > > 
> > > On Mon, 2014-02-24 at 11:54 -0800, Linus Torvalds wrote:
> > > > On Mon, Feb 24, 2014 at 10:53 AM, Paul E. McKenney
> > > >  wrote:
> > > > >
> > > > > Good points.  How about the following replacements?
> > > > >
> > > > > 3.  Adding or subtracting an integer to/from a chained pointer
> > > > > results in another chained pointer in that same pointer chain.
> > > > > The results of addition and subtraction operations that cancel
> > > > > the chained pointer's value (for example, "p-(long)p" where 
> > > > > "p"
> > > > > is a pointer to char) are implementation defined.
> > > > >
> > > > > 4.  Bitwise operators ("&", "|", "^", and I suppose also "~")
> > > > > applied to a chained pointer and an integer for the purposes
> > > > > of alignment and pointer translation results in another
> > > > > chained pointer in that same pointer chain.  Other uses
> > > > > of bitwise operators on chained pointers (for example,
> > > > > "p|~0") are implementation defined.
> > > > 
> > > > Quite frankly, I think all of this language that is about the actual
> > > > operations is irrelevant and wrong.
> > > > 
> > > > It's not going to help compiler writers, and it sure isn't going to
> > > > help users that read this.
> > > > 
> > > > Why not just talk about "value chains" and that any operations that
> > > > restrict the value range severely end up breaking the chain. There is
> > > > no point in listing the operations individually, because every single
> > > > operation *can* restrict things. Listing individual operations and
> > > > depdendencies is just fundamentally wrong.
> > > 
> > > [...]
> > > 
> > > > The *only* thing that matters for all of them is whether they are
> > > > "value-preserving", or whether they drop so much information that the
> > > > compiler might decide to use a control dependency instead. That's true
> > > > for every single one of them.
> > > > 
> > > > Similarly, actual true control dependencies that limit the problem
> > > > space sufficiently that the actual pointer value no longer has
> > > > significant information in it (see the above example) are also things
> > > > that remove information to the point that only a control dependency
> > > > remains. Even when the value itself is not modified in any way at all.
> > > 
> > > I agree that just considering syntactic properties of the program seems
> > > to be insufficient.  Making it instead depend on whether there is a
> > > "semantic" dependency due to a value being "necessary" to compute a
> > > result seems better.  However, whether a value is "necessary" might not
> > > be obvious, and I understand Paul's argument that he does not want to
> > > have to reason about all potential compiler optimizations.  Thus, I
> > > believe we need to specify when a value is "necessary".
> > > 
> > > I have a suggestion for a somewhat different formulation of the feature
> > > that you seem to have in mind, which I'll discuss below.  Excuse the
> > > verbosity of the following, but I'd rather like to avoid
> > > misunderstandings than save a few words.
> > 
> > Thank you very much for putting this forward!  I must confess that I was
> > stuck, and my earlier attempt now enshrined in the C11 and C++11 standards
> > is quite clearly way bogus.
> > 
> > One possible saving grace:  From discussions at the standards committee
> > meeting a few weeks ago, there is a some chance that the committee will
> > be willing to do a rip-and-replace on the current memory_order_consume
> > wording, without provisions for backwards compatibility with the current
> > bogosity.
> > 
> > > What we'd like to capture is that a value originating from a mo_consume
> > > load is "necessary" for a computation (e.g., it "cannot" be replaced
> > > with value predictions and/or control dependencies); if that's the case
> > > in the program, we can reasonably assume that a compiler implementation
> > > will transform this into a data dependency, which will then lead to
> > > ordering guarantees by the HW.
> > > 
> > > However, we need to specify when a value is "necessary".  We could say
> > > that this is implementation-defined, and use a set of litmus tests
> > > (e.g., like those discussed in the thread) to roughly carve out what a
> > > programmer could expect.  This may even be practical for a project like
> > > the Linux kernel that follows strict project-internal rules and pays a
> > > lot of attention to what the particular implementations of compilers
> > > expected to compile the kernel are doing.  However, I think this
> > > approach would be too vague for the standard and for many other
> > > programs/projects.
> > 
> > I agree 

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-03 Thread Paul E. McKenney
On Mon, Mar 03, 2014 at 07:55:08PM +0100, Torvald Riegel wrote:
> xagsmtp2.20140303190831.9...@uk1vsc.vnet.ibm.com
> X-Xagent-Gateway: uk1vsc.vnet.ibm.com (XAGSMTP2 at UK1VSC)
> 
> On Fri, 2014-02-28 at 16:50 -0800, Paul E. McKenney wrote:
> > +o  Do not use the results from the boolean "&&" and "||" when
> > +   dereferencing.  For example, the following (rather improbable)
> > +   code is buggy:
> > +
> > +   int a[2];
> > +   int index;
> > +   int force_zero_index = 1;
> > +
> > +   ...
> > +
> > +   r1 = rcu_dereference(i1)
> > +   r2 = a[r1 && force_zero_index];  /* BUGGY!!! */
> > +
> > +   The reason this is buggy is that "&&" and "||" are often compiled
> > +   using branches.  While weak-memory machines such as ARM or PowerPC
> > +   do order stores after such branches, they can speculate loads,
> > +   which can result in misordering bugs.
> > +
> > +o  Do not use the results from relational operators ("==", "!=",
> > +   ">", ">=", "<", or "<=") when dereferencing.  For example,
> > +   the following (quite strange) code is buggy:
> > +
> > +   int a[2];
> > +   int index;
> > +   int flip_index = 0;
> > +
> > +   ...
> > +
> > +   r1 = rcu_dereference(i1)
> > +   r2 = a[r1 != flip_index];  /* BUGGY!!! */
> > +
> > +   As before, the reason this is buggy is that relational operators
> > +   are often compiled using branches.  And as before, although
> > +   weak-memory machines such as ARM or PowerPC do order stores
> > +   after such branches, but can speculate loads, which can again
> > +   result in misordering bugs.
> 
> Those two would be allowed by the wording I have recently proposed,
> AFAICS.  r1 != flip_index would result in two possible values (unless
> there are further constraints due to the type of r1 and the values that
> flip_index can have).

And I am OK with the value_dep_preserving type providing more/better
guarantees than we get by default from current compilers.

One question, though.  Suppose that the code did not want a value
dependency to be tracked through a comparison operator.  What does
the developer do in that case?  (The reason I ask is that I have
not yet found a use case in the Linux kernel that expects a value
dependency to be tracked through a comparison.)

> I don't think the wording is flawed.  We could raise the requirement of
> having more than one value left for r1 to having more than N with N > 1
> values left, but the fundamental problem remains in that a compiler
> could try to generate a (big) switch statement.
> 
> Instead, I think that this indicates that the value_dep_preserving type
> modifier would be useful: It would tell the compiler that it shouldn't
> transform this into a branch in this case, yet allow that optimization
> for all other code.

Understood!

BTW, my current task is generating examples using the value_dep_preserving
type for RCU-protected array indexes.

Thanx, Paul



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-03 Thread Torvald Riegel
On Fri, 2014-02-28 at 16:50 -0800, Paul E. McKenney wrote:
> +oDo not use the results from the boolean "&&" and "||" when
> + dereferencing.  For example, the following (rather improbable)
> + code is buggy:
> +
> + int a[2];
> + int index;
> + int force_zero_index = 1;
> +
> + ...
> +
> + r1 = rcu_dereference(i1)
> + r2 = a[r1 && force_zero_index];  /* BUGGY!!! */
> +
> + The reason this is buggy is that "&&" and "||" are often compiled
> + using branches.  While weak-memory machines such as ARM or PowerPC
> + do order stores after such branches, they can speculate loads,
> + which can result in misordering bugs.
> +
> +oDo not use the results from relational operators ("==", "!=",
> + ">", ">=", "<", or "<=") when dereferencing.  For example,
> + the following (quite strange) code is buggy:
> +
> + int a[2];
> + int index;
> + int flip_index = 0;
> +
> + ...
> +
> + r1 = rcu_dereference(i1)
> + r2 = a[r1 != flip_index];  /* BUGGY!!! */
> +
> + As before, the reason this is buggy is that relational operators
> + are often compiled using branches.  And as before, although
> + weak-memory machines such as ARM or PowerPC do order stores
> + after such branches, but can speculate loads, which can again
> + result in misordering bugs.

Those two would be allowed by the wording I have recently proposed,
AFAICS.  r1 != flip_index would result in two possible values (unless
there are further constraints due to the type of r1 and the values that
flip_index can have).

I don't think the wording is flawed.  We could raise the requirement of
having more than one value left for r1 to having more than N with N > 1
values left, but the fundamental problem remains in that a compiler
could try to generate a (big) switch statement.

Instead, I think that this indicates that the value_dep_preserving type
modifier would be useful: It would tell the compiler that it shouldn't
transform this into a branch in this case, yet allow that optimization
for all other code.



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-03 Thread Torvald Riegel
On Thu, 2014-02-27 at 09:50 -0800, Paul E. McKenney wrote:
> Your proposal looks quite promising at first glance.  But rather than
> try and comment on it immediately, I am going to take a number of uses of
> RCU from the Linux kernel and apply your proposal to them, then respond
> with the results
> 
> Fair enough?

Sure.  Thanks for doing the cross-check!



Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-03 Thread Torvald Riegel
On Thu, 2014-02-27 at 11:47 -0800, Linus Torvalds wrote:
> On Thu, Feb 27, 2014 at 11:06 AM, Paul E. McKenney
>  wrote:
> >
> > 3.  The comparison was against another RCU-protected pointer,
> > where that other pointer was properly fetched using one
> > of the RCU primitives.  Here it doesn't matter which pointer
> > you use.  At least as long as the rcu_assign_pointer() for
> > that other pointer happened after the last update to the
> > pointed-to structure.
> >
> > I am a bit nervous about #3.  Any thoughts on it?
> 
> I think that it might be worth pointing out as an example, and saying
> that code like
> 
>p = atomic_read(consume);
>X;
>q = atomic_read(consume);
>Y;
>if (p == q)
> data = p->val;
> 
> then the access of "p->val" is constrained to be data-dependent on
> *either* p or q, but you can't really tell which, since the compiler
> can decide that the values are interchangeable.

The wording I proposed would make the p dereference have a value
dependency unless X and Y would somehow restrict p and q.  The reasoning
is that if the atomic loads return potentially more than one value, then
even if we find out that two such loads did return the same value, we
still don't know what the exact value was.



Re: [AVR] remove two maintainers

2014-03-03 Thread Denis Chertykov
2014-03-03 21:01 GMT+04:00 David Edelsohn :
> On Mon, Mar 3, 2014 at 7:04 AM, Denis Chertykov  wrote:
>> 2014-03-03 15:35 GMT+04:00 David Brown :
>>> On 02/03/14 19:24, Denis Chertykov wrote:
 I would remove two maintainers for AVR port:
 1. Anatoly Sokolov 
 2. Eric Weddington 

 I have discussed the removal with Anatoly Sokolov and he is agree with it.
 I can't discuss the removal with Eric Weddington because his mail
 address invalid.

 Must somebody approve the removal ?  (Or I can just apply it)

 Denis.

>>>
>>> Eric Weddington has left Atmel, so his address will no longer be valid.
>>>  I don't know if he still has time to work with AVRs, or if he would
>>> still be able to be a maintainer for the AVR port.  But I am pretty sure
>>> that his new job will not involve AVR's significantly, so it would only
>>> be as a hobby (or at best, as a normal avr gcc user).
>>>
>>> Atmel includes gcc in their development tool (AVR Studio), as well as
>>> providing pre-built packages (for Windows and Linux) with the avr-libc
>>> library and related tools, using snapshots of mainline gcc with a few
>>> patches (for things like support of newer devices).  So it seems
>>> reasonable to expect that they will be interested in the development and
>>> maintenance of the avr port of gcc even though Eric has now left them.
>>> If you would like, I can try to contact Atmel and ask if they have
>>> someone who would like to take Eric's seat as a port maintainer (or you
>>> could do so yourself from Atmel's website).
>>
>> I'm not looking for additional maintainers I just want to remove inactive.
>
> Maintainers can resign at any time. Mentoring and nominating new
> maintainers is appreciated.
>
> Note that maintainers are a personal appointment, not a company
> appointment, so job changes do not affect maintainership.
>
> You can apply the change to remove Anatoly's name. Someone should
> confirm with Eric Weddington that he wishes to resign.

Thank you for explanation.
I'm waiting for Eric.
His new email address: eric.wedding...@gmail.com

Denis.


Re: [AVR] remove two maintainers

2014-03-03 Thread David Edelsohn
On Mon, Mar 3, 2014 at 7:04 AM, Denis Chertykov  wrote:
> 2014-03-03 15:35 GMT+04:00 David Brown :
>> On 02/03/14 19:24, Denis Chertykov wrote:
>>> I would remove two maintainers for AVR port:
>>> 1. Anatoly Sokolov 
>>> 2. Eric Weddington 
>>>
>>> I have discussed the removal with Anatoly Sokolov and he is agree with it.
>>> I can't discuss the removal with Eric Weddington because his mail
>>> address invalid.
>>>
>>> Must somebody approve the removal ?  (Or I can just apply it)
>>>
>>> Denis.
>>>
>>
>> Eric Weddington has left Atmel, so his address will no longer be valid.
>>  I don't know if he still has time to work with AVRs, or if he would
>> still be able to be a maintainer for the AVR port.  But I am pretty sure
>> that his new job will not involve AVR's significantly, so it would only
>> be as a hobby (or at best, as a normal avr gcc user).
>>
>> Atmel includes gcc in their development tool (AVR Studio), as well as
>> providing pre-built packages (for Windows and Linux) with the avr-libc
>> library and related tools, using snapshots of mainline gcc with a few
>> patches (for things like support of newer devices).  So it seems
>> reasonable to expect that they will be interested in the development and
>> maintenance of the avr port of gcc even though Eric has now left them.
>> If you would like, I can try to contact Atmel and ask if they have
>> someone who would like to take Eric's seat as a port maintainer (or you
>> could do so yourself from Atmel's website).
>
> I'm not looking for additional maintainers I just want to remove inactive.

Maintainers can resign at any time. Mentoring and nominating new
maintainers is appreciated.

Note that maintainers are a personal appointment, not a company
appointment, so job changes do not affect maintainership.

You can apply the change to remove Anatoly's name. Someone should
confirm with Eric Weddington that he wishes to resign.

Thanks, David


Re: [RFC][PATCH 0/5] arch: atomic rework

2014-03-03 Thread Torvald Riegel
On Thu, 2014-02-27 at 09:01 -0800, Linus Torvalds wrote:
> On Thu, Feb 27, 2014 at 7:37 AM, Torvald Riegel  wrote:
> > Regarding the latter, we make a fresh start at each mo_consume load (ie,
> > we assume we know nothing -- L could have returned any possible value);
> > I believe this is easier to reason about than other scopes like function
> > granularities (what happens on inlining?), or translation units.  It
> > should also be simple to implement for compilers, and would hopefully
> > not constrain optimization too much.
> >
> > [...]
> >
> > Paul's litmus test would work, because we guarantee to the programmer
> > that it can assume that the mo_consume load would return any value
> > allowed by the type; effectively, this forbids the compiler analysis
> > Paul thought about:
> 
> So realistically, since with the new wording we can ignore the silly
> cases (ie "p-p") and we can ignore the trivial-to-optimize compiler
> cases ("if (p == &variable) .. use p"), and you would forbid the
> "global value range optimization case" that Paul bright up, what
> remains would seem to be just really subtle compiler transformations
> of data dependencies to control dependencies.
> 
> And the only such thing I can think of is basically compiler-initiated
> value-prediction, presumably directed by PGO (since now if the value
> prediction is in the source code, it's considered to break the value
> chain).

The other example that comes to mind would be feedback-directed JIT
compilation.  I don't think that's widely used today, and it might never
be for the kernel -- but *in the standard*, we at least have to consider
what the future might bring.

> The good thing is that afaik, value-prediction is largely not used in
> real life, afaik. There are lots of papers on it, but I don't think
> anybody actually does it (although I can easily see some
> specint-specific optimization pattern that is build up around it).
> 
> And even value prediction is actually fine, as long as the compiler
> can see the memory *source* of the value prediction (and it isn't a
> mo_consume). So it really ends up limiting your value prediction in
> very simple ways: you cannot do it to function arguments if they are
> registers. But you can still do value prediction on values you loaded
> from memory, if you can actually *see* that memory op.

I think one would need to show that the source is *not even indirectly*
a mo_consume load.  With the wording I proposed, value dependencies
don't break when storing to / loading from memory locations.

Thus, if a compiler ends up at a memory load after waling SSA, it needs
to prove that the load cannot read a value that (1) was produced by a
store sequenced-before the load and (2) might carry a value dependency
(e.g., by being a mo_consume load) that the value prediction in question
would break.  This, in general, requires alias analysis.
Deciding whether a prediction would break a value dependency has to
consider what later stages in a compiler would be doing, including LTO
or further rounds of inlining/optimizations.  OTOH, if the compiler can
treat an mo_consume load as returning all possible values (eg, by
ignoring all knowledge about it), then it can certainly do so with other
memory loads too.

So, I think that the constraints due to value dependencies can matter in
practice.  However, the impact on optimizations on
non-mo_consume-related code are hard to estimate -- I don't see a huge
amount of impact right now, but I also wouldn't want to predict that
this can't change in the future.

> Of course, on more strongly ordered CPU's, even that "register
> argument" limitation goes away.
> 
> So I agree that there is basically no real optimization constraint.
> Value-prediction is of dubious value to begin with, and the actual
> constraint on its use if some compiler writer really wants to is not
> onerous.
> 
> > What I have in mind is roughly the following (totally made-up syntax --
> > suggestions for how to do this properly are very welcome):
> > * Have a type modifier (eg, like restrict), that specifies that
> > operations on data of this type are preserving value dependencies:
> 
> So I'm not violently opposed, but I think the upsides are not great.
> Note that my earlier suggestion to use "restrict" wasn't because I
> believed the annotation itself would be visible, but basically just as
> a legalistic promise to the compiler that *if* it found an alias, then
> it didn't need to worry about ordering. So to me, that type modifier
> was about conceptual guarantees, not about actual value chains.
> 
> Anyway, the reason I don't believe any type modifier (and
> "[[carries_dependency]]" is basically just that) is worth it is simply
> that it adds a real burden on the programmer, without actually giving
> the programmer any real upside:
> 
> Within a single function, the compiler already sees that mo_consume
> source, and so doing a type-based restriction doesn't really help. The
> information is alrea

Re: Asm volatile causing performance regressions on ARM

2014-03-03 Thread David Brown
On 03/03/14 14:54, Richard Biener wrote:
> On Mon, Mar 3, 2014 at 1:53 PM, David Brown  wrote:
>> On 03/03/14 11:49, Richard Biener wrote:
>>> On Mon, Mar 3, 2014 at 11:41 AM, David Brown  wrote:
 On 28/02/14 13:19, Richard Sandiford wrote:
> Georg-Johann Lay  writes:
>> Notice that in code1, func might contain such asm-pairs to implement
>> atomic operations, but moving costly_func across func does *not*
>> affect the interrupt respond times in such a disastrous way.
>>
>> Thus you must be *very* careful w.r.t. optimizing against asm volatile
>> + memory clobber.  It's too easy to miss some side effects of *real*
>> code.
>
> I understand the example, but I don't think volatile asms guarantee
> what you want here.
>
>> Optimizing code to scrap and pointing to some GCC internal reasoning or 
>> some
>> standard's wording does not help with real code.
>
> But how else can a compiler work?  It doesn't just regurgitate canned 
> code,
> so it can't rely on human intuition as to what "makes sense".  We have to
> have specific rules and guarantees and say that anything outside those
> rules and guarantees is undefined.
>
> It sounds like you want an asm with an extra-strong ordering guarantee.
> I think that would need to be an extension, since it would need to 
> consider
> cases where the asm is used in a function.  (Shades of carries_dependence
> or whatever in the huge atomic thread.)  I think anything where:
>
>   void foo (void) { X; }
>   void bar (void) { Y1; foo (); Y2; }
>
> has different semantics from:
>
>   void bar (void) { Y1; X; Y2; }
>
> is very dangerous.  And assuming that any function call could enable
> or disable interrupts, and therefore that nothing can be moved across
> a non-const function call, would limit things a bit too much.
>
> Thanks,
> Richard
>
>

 I think the problem stems from "volatile" being a barrier to /data flow/
 changes,
>>>
>>> What kind of /data flow/ changes?  It certainly isn't that currently,
>>> only two volatiles always conflict but not a volatile and a non-volatile 
>>> mem:
>>>
>>> static int
>>> true_dependence_1 (const_rtx mem, enum machine_mode mem_mode, rtx mem_addr,
>>>const_rtx x, rtx x_addr, bool mem_canonicalized)
>>> {
>>> ...
>>>   if (MEM_VOLATILE_P (x) && MEM_VOLATILE_P (mem))
>>> return 1;
>>>
>>> bool
>>> refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p)
>>> {
>>> ...
>>>   /* Two volatile accesses always conflict.  */
>>>   if (ref1->volatile_p
>>>   && ref2->volatile_p)
>>> return true;
>>>
 but what is needed in this case is a barrier to /control flow/
 changes.  To my knowledge, C does not provide any way of doing this, nor
 are there existing gcc extensions to guarantee the ordering.  But it
 certainly is the case that control flow ordering like this is important
 - it can be critical in embedded systems (such as in the example here by
 Georg-Johann), but it can also be important for non-embedded systems
 (such as to minimise the time spend while holding a lock).
>>>
>>> Can you elaborate on this?  I have a hard time thinking of a
>>> control flow transform that affects volatiles.
>>>
>>> Richard.
>>>
>>
>> I am perhaps not expressing myself very clearly here (and I don't know
>> the internals of gcc well enough to use the source to help).
>>
>> Normal (i.e., not "asm") volatile accesses force an order on those
>> volatile data accesses - if the source code says a volatile read of "x"
>> then a volatile read of "y", then the compiler has to issue those reads
>> in that order.  It can't re-order them, or hoist them out of a loop, or
>> do any other re-ordering optimisations.  Clobbers, inputs and outputs in
>> inline assembly give a similar ordering on the data flow.  But none of
>> this affects the /control/ flow.  So the __attribute__((const))
>> "costly_func" described by Georg-Johann can be moved freely by the
>> compiler amongst these volatile /data/ accesses.
>>
>> The C abstract machine does not have any concept of timings, only of
>> observable accesses (volatile accesses, calls to external code, and
>> entry/exit from main()).  So it does not distinguish between the sequences:
>>
>> volX = 1;
>> y = costly_func(z);
>> volX = 2;
>>
>> and
>>
>> y = costly_func(z);
>> volX = 1;
>> volX = 2;
>>
>> and
>> volX = 1;
>> volX = 2;
>> y = costly_func(z);
>>
>> (This assumes that costly_func is __attribute__((const)), and y and z
>> are non-volatile.)
>>
>> For some real-world usage, however, these sequences are very different.
>>  In "big" systems, it is unlikely to change correctness.  If "volX" were
>> part of a locking mechanism, for example, then each version of this code
>> would be correct - but they might dif

Re: [RFC] Meta-description for tree and gimple folding

2014-03-03 Thread Kai Tietz
2014-03-03 12:33 GMT+01:00 Richard Biener :
> On Fri, 28 Feb 2014, Kai Tietz wrote:
>
>> Hmm,  this all reminds me about the approach Andrew Pinski and I came
>> up with two years ago.
>
> You are talking about the gimple folding interface?  Yes, but it's
> more similar to what I proposed before that.

Well, this interface was for rtl, gimple, and tree AFAIR.


>> So I doubt that we want to keep fold-const patterns similar to gimple
>> (forward-prop) ones.
>> Wouldn't it make more sense to move fold-const patterns completely
>> into gimple, and having a facility in FE to ask gimple to *pre*-fold a
>> given tree to see if a constant-expression can be achieved?
>
> That was proposed by somebody, yes.  The FE would hand off an
> expression to 1) the gimplifier to gimplify it, then 2) to the
> gimple folder to simplify it.  Not sure if that's a good design
> but yes, it mimics the awkward thing we do now (genericize for
> folding in fold_stmt), just the other way around - and it makes
> it very costly.

Right, if we would redo step one, and two each time we visit same
statement again, then of course we would produce pretty high load.
By hashing this *pre*-computed gimple-expression I think the load of
such an approach would lower pretty much.  Of course it is true that
we move gimplification-costs to FE.  Nevertheless the avarage costs
should be in general the same as we have them now.


> Having a single meta-description of simplifications makes it
> possible to have the best of both worlds (no need to GENERICIZE
> back from GIMPLE and no need to GIMPLIFY from GENERIC) with
> a single point of maintainance.

True.  I am fully agreeing to the positive effects of a single
meta-description for this area. For sure it is worth to avoid the
re-doing of the same folding for GENERIC/GIMPLE again and again.

> [the possibility to use offline verification tools for the
> transforms comes to my mind as well]
This is actually a pretty interesting idea.  As it would allow us to
do testing for this area without  side-effects by high-level passes,
target-properties, etc

> If you think the .md style pattern definitions are too limiting
> can you think of sth more powerful without removing the possibility
> of implementing the matching with a state machine to make it O(1)?

Well, I am not opposed to the use of .md style pattern defintions at
all.  I see just some weaknesses on the current tree-folding
mechanism.
AST folder tries to fold into statementes by recursion into.  This
causes pretty high load in different areas.  Like stack-growth,
unnecessary repetition of operations on inner-statements, and complex
patterns for expression associative/distributive/commutative rules for
current operation-code.
I am thinking about a model where we use just for the
base-fold-operations the .md-style pattern definitions. On top of this
model we set a layer implementing associative/commutative/distributive
properties for statements in an optimize form.
By this we can do two different things with lower load.  One hand we
can do "virtual" folding and avoid tree-rebuilding without need.  On
the second hand we can use same pass for "normalize" tree structure of
expression-chains.
Additionally we can get rid of the than pretty useless reassociation
pass, which is IMHO just necessary by gimple-folders inability to do
that.

> Thanks,
> Richard.

Additional I think we need to some related conceptional decisions here
too. I saw here some missing concept for abstraction of
back-end/target specific limitations in middle-end.  For example the
branch-cost optimization, or the target's pattern-preferences, etc.
We lack to make a consequent difference between target-agnostic and
target-specific passes in middle-end.  There were introduced
diagnostics in late passes.  For latter see "possible integral
overflow on additions in loops" as one sample. Such stuff hinders us
to do better job on pattern-folding in many places. Right now we tend
to introduce backend-representation too early (eg branch-cost,
avoiding packing of conditional patterns due some limitiation on some
targets, too early jump-threading, etc).  So I think we should think
about at least some passes doing target/backend agnostic
folding/normalization, and doing target/backend-specific
pattern-transformation explicit in latter passes.

Regards,
Kai


Re: Asm volatile causing performance regressions on ARM

2014-03-03 Thread Richard Biener
On Mon, Mar 3, 2014 at 1:53 PM, David Brown  wrote:
> On 03/03/14 11:49, Richard Biener wrote:
>> On Mon, Mar 3, 2014 at 11:41 AM, David Brown  wrote:
>>> On 28/02/14 13:19, Richard Sandiford wrote:
 Georg-Johann Lay  writes:
> Notice that in code1, func might contain such asm-pairs to implement
> atomic operations, but moving costly_func across func does *not*
> affect the interrupt respond times in such a disastrous way.
>
> Thus you must be *very* careful w.r.t. optimizing against asm volatile
> + memory clobber.  It's too easy to miss some side effects of *real*
> code.

 I understand the example, but I don't think volatile asms guarantee
 what you want here.

> Optimizing code to scrap and pointing to some GCC internal reasoning or 
> some
> standard's wording does not help with real code.

 But how else can a compiler work?  It doesn't just regurgitate canned code,
 so it can't rely on human intuition as to what "makes sense".  We have to
 have specific rules and guarantees and say that anything outside those
 rules and guarantees is undefined.

 It sounds like you want an asm with an extra-strong ordering guarantee.
 I think that would need to be an extension, since it would need to consider
 cases where the asm is used in a function.  (Shades of carries_dependence
 or whatever in the huge atomic thread.)  I think anything where:

   void foo (void) { X; }
   void bar (void) { Y1; foo (); Y2; }

 has different semantics from:

   void bar (void) { Y1; X; Y2; }

 is very dangerous.  And assuming that any function call could enable
 or disable interrupts, and therefore that nothing can be moved across
 a non-const function call, would limit things a bit too much.

 Thanks,
 Richard


>>>
>>> I think the problem stems from "volatile" being a barrier to /data flow/
>>> changes,
>>
>> What kind of /data flow/ changes?  It certainly isn't that currently,
>> only two volatiles always conflict but not a volatile and a non-volatile mem:
>>
>> static int
>> true_dependence_1 (const_rtx mem, enum machine_mode mem_mode, rtx mem_addr,
>>const_rtx x, rtx x_addr, bool mem_canonicalized)
>> {
>> ...
>>   if (MEM_VOLATILE_P (x) && MEM_VOLATILE_P (mem))
>> return 1;
>>
>> bool
>> refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p)
>> {
>> ...
>>   /* Two volatile accesses always conflict.  */
>>   if (ref1->volatile_p
>>   && ref2->volatile_p)
>> return true;
>>
>>> but what is needed in this case is a barrier to /control flow/
>>> changes.  To my knowledge, C does not provide any way of doing this, nor
>>> are there existing gcc extensions to guarantee the ordering.  But it
>>> certainly is the case that control flow ordering like this is important
>>> - it can be critical in embedded systems (such as in the example here by
>>> Georg-Johann), but it can also be important for non-embedded systems
>>> (such as to minimise the time spend while holding a lock).
>>
>> Can you elaborate on this?  I have a hard time thinking of a
>> control flow transform that affects volatiles.
>>
>> Richard.
>>
>
> I am perhaps not expressing myself very clearly here (and I don't know
> the internals of gcc well enough to use the source to help).
>
> Normal (i.e., not "asm") volatile accesses force an order on those
> volatile data accesses - if the source code says a volatile read of "x"
> then a volatile read of "y", then the compiler has to issue those reads
> in that order.  It can't re-order them, or hoist them out of a loop, or
> do any other re-ordering optimisations.  Clobbers, inputs and outputs in
> inline assembly give a similar ordering on the data flow.  But none of
> this affects the /control/ flow.  So the __attribute__((const))
> "costly_func" described by Georg-Johann can be moved freely by the
> compiler amongst these volatile /data/ accesses.
>
> The C abstract machine does not have any concept of timings, only of
> observable accesses (volatile accesses, calls to external code, and
> entry/exit from main()).  So it does not distinguish between the sequences:
>
> volX = 1;
> y = costly_func(z);
> volX = 2;
>
> and
>
> y = costly_func(z);
> volX = 1;
> volX = 2;
>
> and
> volX = 1;
> volX = 2;
> y = costly_func(z);
>
> (This assumes that costly_func is __attribute__((const)), and y and z
> are non-volatile.)
>
> For some real-world usage, however, these sequences are very different.
>  In "big" systems, it is unlikely to change correctness.  If "volX" were
> part of a locking mechanism, for example, then each version of this code
> would be correct - but they might differ in the length of time that the
> locks were held, and that could seriously affect performance.  In
> embedded systems, low performance could mean failure.  The problem is
> exaspe

Re: Asm volatile causing performance regressions on ARM

2014-03-03 Thread David Brown
On 03/03/14 11:49, Richard Biener wrote:
> On Mon, Mar 3, 2014 at 11:41 AM, David Brown  wrote:
>> On 28/02/14 13:19, Richard Sandiford wrote:
>>> Georg-Johann Lay  writes:
 Notice that in code1, func might contain such asm-pairs to implement
 atomic operations, but moving costly_func across func does *not*
 affect the interrupt respond times in such a disastrous way.

 Thus you must be *very* careful w.r.t. optimizing against asm volatile
 + memory clobber.  It's too easy to miss some side effects of *real*
 code.
>>>
>>> I understand the example, but I don't think volatile asms guarantee
>>> what you want here.
>>>
 Optimizing code to scrap and pointing to some GCC internal reasoning or 
 some
 standard's wording does not help with real code.
>>>
>>> But how else can a compiler work?  It doesn't just regurgitate canned code,
>>> so it can't rely on human intuition as to what "makes sense".  We have to
>>> have specific rules and guarantees and say that anything outside those
>>> rules and guarantees is undefined.
>>>
>>> It sounds like you want an asm with an extra-strong ordering guarantee.
>>> I think that would need to be an extension, since it would need to consider
>>> cases where the asm is used in a function.  (Shades of carries_dependence
>>> or whatever in the huge atomic thread.)  I think anything where:
>>>
>>>   void foo (void) { X; }
>>>   void bar (void) { Y1; foo (); Y2; }
>>>
>>> has different semantics from:
>>>
>>>   void bar (void) { Y1; X; Y2; }
>>>
>>> is very dangerous.  And assuming that any function call could enable
>>> or disable interrupts, and therefore that nothing can be moved across
>>> a non-const function call, would limit things a bit too much.
>>>
>>> Thanks,
>>> Richard
>>>
>>>
>>
>> I think the problem stems from "volatile" being a barrier to /data flow/
>> changes,
> 
> What kind of /data flow/ changes?  It certainly isn't that currently,
> only two volatiles always conflict but not a volatile and a non-volatile mem:
> 
> static int
> true_dependence_1 (const_rtx mem, enum machine_mode mem_mode, rtx mem_addr,
>const_rtx x, rtx x_addr, bool mem_canonicalized)
> {
> ...
>   if (MEM_VOLATILE_P (x) && MEM_VOLATILE_P (mem))
> return 1;
> 
> bool
> refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p)
> {
> ...
>   /* Two volatile accesses always conflict.  */
>   if (ref1->volatile_p
>   && ref2->volatile_p)
> return true;
> 
>> but what is needed in this case is a barrier to /control flow/
>> changes.  To my knowledge, C does not provide any way of doing this, nor
>> are there existing gcc extensions to guarantee the ordering.  But it
>> certainly is the case that control flow ordering like this is important
>> - it can be critical in embedded systems (such as in the example here by
>> Georg-Johann), but it can also be important for non-embedded systems
>> (such as to minimise the time spend while holding a lock).
> 
> Can you elaborate on this?  I have a hard time thinking of a
> control flow transform that affects volatiles.
> 
> Richard.
> 

I am perhaps not expressing myself very clearly here (and I don't know
the internals of gcc well enough to use the source to help).

Normal (i.e., not "asm") volatile accesses force an order on those
volatile data accesses - if the source code says a volatile read of "x"
then a volatile read of "y", then the compiler has to issue those reads
in that order.  It can't re-order them, or hoist them out of a loop, or
do any other re-ordering optimisations.  Clobbers, inputs and outputs in
inline assembly give a similar ordering on the data flow.  But none of
this affects the /control/ flow.  So the __attribute__((const))
"costly_func" described by Georg-Johann can be moved freely by the
compiler amongst these volatile /data/ accesses.

The C abstract machine does not have any concept of timings, only of
observable accesses (volatile accesses, calls to external code, and
entry/exit from main()).  So it does not distinguish between the sequences:

volX = 1;
y = costly_func(z);
volX = 2;

and

y = costly_func(z);
volX = 1;
volX = 2;

and
volX = 1;
volX = 2;
y = costly_func(z);

(This assumes that costly_func is __attribute__((const)), and y and z
are non-volatile.)

For some real-world usage, however, these sequences are very different.
 In "big" systems, it is unlikely to change correctness.  If "volX" were
part of a locking mechanism, for example, then each version of this code
would be correct - but they might differ in the length of time that the
locks were held, and that could seriously affect performance.  In
embedded systems, low performance could mean failure.  The problem is
exasperated by small cpus that need library functions for seemingly
simple operations - gcc might happily move a division operation around
without realising the massive time cost on an 8-bit processor.

In part

Re: [AVR] remove two maintainers

2014-03-03 Thread Denis Chertykov
2014-03-03 15:35 GMT+04:00 David Brown :
> On 02/03/14 19:24, Denis Chertykov wrote:
>> I would remove two maintainers for AVR port:
>> 1. Anatoly Sokolov 
>> 2. Eric Weddington 
>>
>> I have discussed the removal with Anatoly Sokolov and he is agree with it.
>> I can't discuss the removal with Eric Weddington because his mail
>> address invalid.
>>
>> Must somebody approve the removal ?  (Or I can just apply it)
>>
>> Denis.
>>
>
> Eric Weddington has left Atmel, so his address will no longer be valid.
>  I don't know if he still has time to work with AVRs, or if he would
> still be able to be a maintainer for the AVR port.  But I am pretty sure
> that his new job will not involve AVR's significantly, so it would only
> be as a hobby (or at best, as a normal avr gcc user).
>
> Atmel includes gcc in their development tool (AVR Studio), as well as
> providing pre-built packages (for Windows and Linux) with the avr-libc
> library and related tools, using snapshots of mainline gcc with a few
> patches (for things like support of newer devices).  So it seems
> reasonable to expect that they will be interested in the development and
> maintenance of the avr port of gcc even though Eric has now left them.
> If you would like, I can try to contact Atmel and ask if they have
> someone who would like to take Eric's seat as a port maintainer (or you
> could do so yourself from Atmel's website).

I'm not looking for additional maintainers I just want to remove inactive.

Denis.


Re: Request for discussion: Rewrite of inline assembler docs

2014-03-03 Thread Richard Sandiford
dw  writes:
> On 2/27/2014 11:32 PM, Richard Sandiford wrote:
>> dw  writes:
>>> On 2/27/2014 4:11 AM, Richard Sandiford wrote:
 Andrew Haley  writes:
> Over the years there has been a great deal of traffic on these lists
> caused by misunderstandings of GCC's inline assembler.  That's partly
> because it's inherently tricky, but the existing documentation needs
> to be improved.
>
> dw  has done a fairly thorough reworking of
> the documentation.  I've helped a bit.
>
> Section 6.41 of the GCC manual has been rewritten.  It has become:
>
> 6.41 How to Use Inline Assembly Language in C Code
> 6.41.1 Basic Asm - Assembler Instructions with No Operands
> 6.41.2 Extended Asm - Assembler Instructions with C Expression Operands
>
> We could simply post the patch to GCC-patches and have at it, but I
> think it's better to discuss the document here first.  You can read it
> at
>
> http://www.LimeGreenSocks.com/gcc/Basic-Asm.html
> http://www.LimeGreenSocks.com/gcc/Extended-Asm.html
> http://www.LimeGreenSocks.com/gcc/extend04.zip (contains .texi, .patch,
> and affected html pages)
>
> All comments are very welcome.
 Thanks for doing this, looks like a big improvement.
>>> Thanks, I did my best.  I appreciate you taking the time to review them.
>>>
 A couple of comments:

 The section on basic asms says:

 Do not expect a sequence of asm statements to remain perfectly
 consecutive after compilation. To ensure that assembler instructions
 maintain their order, use a single asm statement containing multiple
 instructions. Note that GCC's optimizer can move asm statements
 relative to other code, including across jumps.

 The "maintain their order" might be a bit misleading, since volatile asms
 (including basic asms) must always be executed in the original order.
 Maybe this was meaning placement/address order instead?
>>> This statement is based on this text from the existing docs:
>>>
>>> "Similarly, you can't expect a sequence of volatile |asm| instructions
>>> to remain perfectly consecutive. If you want consecutive output, use a
>>> single |asm|."
>>>
>>> I do not dispute what you are saying.  I just want to confirm that the
>>> existing docs are incorrect before making a change.  Also, see Andi's
>>> response re -fno-toplevel-reorder.
>>>
>>> It seems to me that recommending "single statement" is both the
>>> clearest, and the safest approach here.  But I'm prepared to change my
>>> mind if there is consensus I should.
>> Right.  I agree with that part.  I just thought that the "maintain their
>> order" could be misunderstood as meaning execution order, whereas I think
>> both sentences of the original docs were talking about being "perfectly
>> consecutive" (which to me means "there are no other instructions inbetween").
>
> Hmm.  I'm not seeing the differences here that you do.

Well, like you say, things can be moved across branches.  So, although
this is a very artificial example:

 asm ("x");
 asm ("y");

could become:

 goto bar;

foo:
 asm ("y");
 ...

bar:
 asm ("x");
 goto foo;

This has reordered the instructions in the sense that they have a
different order in memory.  But they are still _executed_ in the same
order.  Actually reordering the execution would be a serious bug.

So I just want to avoid anything that gives the impression that "y" can
be executed before "x" in this example.  I still think:

> Since the existing docs say "GCC's optimizer can move asm statements 
> relative to other code", how would you feel about:
>
> "Do not expect a sequence of |asm| statements to remain perfectly 
> consecutive after compilation. If you want to stop the compiler from 
> reordering or inserting anything into a sequence of assembler 
> instructions, use a single |asm| statement containing multiple 
> instructions. Note that GCC's optimizer can move |asm| statements 
> relative to other code, including across jumps."

...this gives the impression that we might try to execute volatiles
in a different order.

 It might also be
 worth mentioning that the number of instances of an asm in the output
 may be different from the input.  (Can it increase as well as decrease?
 I'm not sure off-hand, but probably yes.)
>>> So, in the volatile section, how about something like this for decrease:
>>>
>>> "GCC does not delete a volatile |asm| if it is reachable, but may delete
>>> it if it can prove that control flow never reaches the location of the
>>> instruction."
>> It's not just that though.  AIUI it would be OK for:
>>
>>if (foo)
>>  {
>>...
>>asm ("x");
>>  }
>>else
>>  {
>>...
>>asm ("x");
>>  }
>>
>> to become:
>>
>>if (foo)
>>  ...
>>else
>>  ...
>>asm ("x");
>
> Could be.  However, I'm not clear what benefit there wo

Re: [AVR] remove two maintainers

2014-03-03 Thread David Brown
On 02/03/14 19:24, Denis Chertykov wrote:
> I would remove two maintainers for AVR port:
> 1. Anatoly Sokolov 
> 2. Eric Weddington 
> 
> I have discussed the removal with Anatoly Sokolov and he is agree with it.
> I can't discuss the removal with Eric Weddington because his mail
> address invalid.
> 
> Must somebody approve the removal ?  (Or I can just apply it)
> 
> Denis.
> 

Eric Weddington has left Atmel, so his address will no longer be valid.
 I don't know if he still has time to work with AVRs, or if he would
still be able to be a maintainer for the AVR port.  But I am pretty sure
that his new job will not involve AVR's significantly, so it would only
be as a hobby (or at best, as a normal avr gcc user).

Atmel includes gcc in their development tool (AVR Studio), as well as
providing pre-built packages (for Windows and Linux) with the avr-libc
library and related tools, using snapshots of mainline gcc with a few
patches (for things like support of newer devices).  So it seems
reasonable to expect that they will be interested in the development and
maintenance of the avr port of gcc even though Eric has now left them.
If you would like, I can try to contact Atmel and ask if they have
someone who would like to take Eric's seat as a port maintainer (or you
could do so yourself from Atmel's website).

mvh.,

David




Re: [RFC] Meta-description for tree and gimple folding

2014-03-03 Thread Richard Biener
On Fri, 28 Feb 2014, Kai Tietz wrote:

> Hmm,  this all reminds me about the approach Andrew Pinski and I came
> up with two years ago.  

You are talking about the gimple folding interface?  Yes, but it's
more similar to what I proposed before that.

> All in all I think it might be worth to
> express folding-patterns in a more abstract way.  So the md-like Lisp
> syntax for this seems to be just stringent.  We make use of such a
> script-language already for machine-description.

That's the reason I use something similar.  Because that form
proved useful for instruction selection and peephole-like transforms.
Exactly what you can generate a state machine for for matching
(the current serial try-fail-try-fail... processing of cases
is bad).

> Nevertheless I doubt that we really want to have same facility for
> fold-const and gimple.  Instead I got the impression that we would
> prefer to have all folding-optimizations instead in Middle-end
> (GIMPLE).  We need folding in front-end (AST) mostly for determination
> of constant-expression detection. Otherwise we want to keep maximum of
> original AST to have best results for debugging (and even to think
> about having alternative FE on our middle/backend) and code-analyzers.

True, but it's a dead end to rely on FEs implementing their own
folding to be able to remove sth from fold-const.c.  And we don't
exactly have to implement the GENERIC code generation part.

> So I doubt that we want to keep fold-const patterns similar to gimple
> (forward-prop) ones.
> Wouldn't it make more sense to move fold-const patterns completely
> into gimple, and having a facility in FE to ask gimple to *pre*-fold a
> given tree to see if a constant-expression can be achieved?

That was proposed by somebody, yes.  The FE would hand off an
expression to 1) the gimplifier to gimplify it, then 2) to the
gimple folder to simplify it.  Not sure if that's a good design
but yes, it mimics the awkward thing we do now (genericize for
folding in fold_stmt), just the other way around - and it makes
it very costly.

Having a single meta-description of simplifications makes it
possible to have the best of both worlds (no need to GENERICIZE
back from GIMPLE and no need to GIMPLIFY from GENERIC) with
a single point of maintainance.

[the possibility to use offline verification tools for the
transforms comes to my mind as well]

If you think the .md style pattern definitions are too limiting
can you think of sth more powerful without removing the possibility
of implementing the matching with a state machine to make it O(1)?

Thanks,
Richard.


Re: [RFC] Meta-description for tree and gimple folding

2014-03-03 Thread Richard Biener
On Fri, 28 Feb 2014, Diego Novillo wrote:

> On Thu, Feb 27, 2014 at 9:34 AM, Richard Biener  wrote:
> 
> > Comments or suggestions?
> 
> On the surface it looks like a nice idea.  However, I would like to
> understand the scope of this.  Are you thinking of a pattern matcher
> with peephole like actions?  Or would you like to evolve a DSL capable
> of writing compiler passes?  (much like MELT).
> 
> I prefer keeping it simple and limit the scope to pattern matching and
> replacement. There will be other things to handle, however (overflow,
> trapping arithmetic, etc).  The language will grow over time.
> 
> In terms of the mini-language, I don't like the lisp syntax but I
> appreciate that it is a familiar language for GCC since the backend
> uses it extensively.
> 
> Please consider testing facilities as well. We should be able to write
> unit tests in this language to test its syntax validation, semantic
> actions, and transformations.

Ok, so let me summarize some of the goals which should answer most
of the questions above.

 1) The IL matching should be translatable to a state machine
   so it becomes O(1) and not O(number of patterns).  [so we
   should be able to move to "mandatory folding" on stmt
   changes in GIMPLE, similar to how we've done it with
   requiring you to use fold_buildN instead of buildN.
   Currently that has a very high cost because we re-build
   GENERIC on folding GIMPLE and dispatch to fold-const.c] 
 2) The generator should be able to target both GENERIC and
   GIMPLE so we can "move" things from fold-const.c to GIMPLE
   ("move" as in add it to GIMPLE but leave it in fold-const.c
   to not regress frontend dependences on it).
 3) The whole thing should enable transitioning away from
   using force_gimple_operand (fold_buildN ()) by providing
   the GIMPLE equivalent of fold_buildN.  [less trees in the
   middle-end]
 4) The API to the match-and-simplify routine should allow
   (SSA) propagators to use the routine for simplifying
   stmts with their current lattice value (CCP, DOM and
   SCCVN are the obvious candidates here).

So yes, because of 1) the matching part will be similar to
how the machine description handles insns.  The actual
transform can be more complex (and I expect it so, see my
reply to Marc).

I expect the language to grow over the existing proposal
but not so much over time (at least I hope so - heh).

As of testing facilities ... I don't see how I can use
the language to write tests?  Do you think of sth like
(very simplified)

/* Match and simplify CST + CST to CST'.  */
(define_match_and_simplify baz
  (PLUS_EXPR INTEGER_CST_P@0 INTEGER_CST_P@1)
  { int_const_binop (PLUS_EXPR, captures[0], captures[1]); })

(test baz
  (PLUS_EXPR 1 2)
  3)

?

Thanks,
Richard.


Re: [RFC] Meta-description for tree and gimple folding

2014-03-03 Thread Richard Biener
On Fri, 28 Feb 2014, Marc Glisse wrote:

> On Thu, 27 Feb 2014, Richard Biener wrote:
> 
> > I've been hacking on a prototype that generates matching and
> > simplification code from a meta-description.  The goal is
> > to provide a single source of transforms currently spread
> > over the compiler, mostly fold-const.c, gimple-fold.c and
> > tree-ssa-forwprop.c.  Another goal is to make these transforms
> > (which are most of the form of generating a simpler form of
> > a pattern-matched IL piece) more readily available to passes
> > like value-numbering so they can be used on-the-fly, using
> > information provided by the pass lattice.  The ultimate
> > goal is to generate (most of) fold-const.c and gimple-fold.c
> > and tree-ssa-forwprop.c from a single meta description.
> [...]
> > Comments or suggestions?
> 
> It is hard to judge from such simple examples. What if I want to do the
> transformation only if CST+CST' doesn't overflow?

We'd probably make the generator possibly fail, eventually adding the
ability for a variant like

/* Match and simplify CST + CST to CST'.  */
(define_match_and_simplify baz
  (PLUS_EXPR INTEGER_CST_P@0 INTEGER_CST_P@1)
  {
captures[0] = int_const_binop (PLUS_EXPR, captures[0], captures[1]);
if (TREE_OVERFLOW (captures[0]))
  captures[0] = NULL_TREE;
  }
  @0)

that is, allow a complete manual transform step with a possible
FAIL outcome.

> Can I make it handle commutative operations cleverly?

As the goal is to allow a state machine to be generated for the matching
(to make the matching O(1) independent of the number of patterns)
the generator would have to duplicate the pattern internally.  But yes,
I've thought about commutativeness.

> How do I restrict some subexpression to have
> a single use?

This kind of restrictions come via the valueize() hook - simply
valueize to NULL_TREE to make the match fail (for example
SSA_NAME_OCCURS_IN_ABNORMAL_PHI could be made fail that way).

> If I want to check some flag_*, do I have to create a predicate
> that tests for that as well as whatever other test the specific operand
> needed?

You mean like a pattern that is enabled conditinal on, for example,
flag_unsafe_math_optimizations?  Didn't think of that yet, but we
can easily add an optional overall pattern condition like we have
in the RTL md files.

> Won't valueize create a lot of not-so-necessary trees when used on
> gimple?

No, valueize should only valueize to SSA names or constants, not
expressions.  It's purpose is to integrate well with a SSA propagator
lattice - for example the tree-ssa-sccvn.c integration (subsume
it's "simplify" implementation) would be

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 208269)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -3344,35 +3344,23 @@ try_to_simplify (gimple stmt)
   if (code == SSA_NAME)
 return NULL_TREE;
 
-  /* First try constant folding based on our current lattice.  */
+  /* First try constant folding based on our current lattice.
+ ???  Should be subsumed by gimple_match_and_simplify.  */
   tem = gimple_fold_stmt_to_constant_1 (stmt, vn_valueize);
   if (tem
   && (TREE_CODE (tem) == SSA_NAME
  || is_gimple_min_invariant (tem)))
 return tem;
 
-  /* If that didn't work try combining multiple statements.  */
-  switch (TREE_CODE_CLASS (code))
-{
-case tcc_reference:
-  /* Fallthrough for some unary codes that can operate on registers.  
*/
-  if (!(code == REALPART_EXPR
-   || code == IMAGPART_EXPR
-   || code == VIEW_CONVERT_EXPR
-   || code == BIT_FIELD_REF))
-   break;
-  /* We could do a little more with unary ops, if they expand
-into binary ops, but it's debatable whether it is worth it. */
-case tcc_unary:
-  return simplify_unary_expression (stmt);
-
-case tcc_comparison:
-case tcc_binary:
-  return simplify_binary_expression (stmt);
-
-default:
-  break;
-}
+  /* If that didn't work try combining multiple statements.
+ ???  Handle multiple stmts being generated by storing
+ at most one in VN_INFO->expr?  But then we'd have to
+ transparently support materializing temporary SSA names
+ created by gimple_match_and_simplify - or we never value-number
+ to them.  */
+  if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)
+  return gimple_match_and_simplify (gimple_assign_lhs (stmt),
+   NULL, vn_valueize);
 
   return NULL_TREE;
 }

> If I write a COND_EXPR matcher, could it generate code for phiopt as
> well?

Not sure, what do you have in mind specifically?

> What is the point of gimple_match (without _and_substitute)? Will it
> clean up the new dead statements after itself?

No, the low-level interface won't.  I'm not yet decided if we should
have an alternate interface taking a stmt iterator rather than a
SSA name and sequence pointer 

Re: Asm volatile causing performance regressions on ARM

2014-03-03 Thread Richard Biener
On Mon, Mar 3, 2014 at 11:41 AM, David Brown  wrote:
> On 28/02/14 13:19, Richard Sandiford wrote:
>> Georg-Johann Lay  writes:
>>> Notice that in code1, func might contain such asm-pairs to implement
>>> atomic operations, but moving costly_func across func does *not*
>>> affect the interrupt respond times in such a disastrous way.
>>>
>>> Thus you must be *very* careful w.r.t. optimizing against asm volatile
>>> + memory clobber.  It's too easy to miss some side effects of *real*
>>> code.
>>
>> I understand the example, but I don't think volatile asms guarantee
>> what you want here.
>>
>>> Optimizing code to scrap and pointing to some GCC internal reasoning or some
>>> standard's wording does not help with real code.
>>
>> But how else can a compiler work?  It doesn't just regurgitate canned code,
>> so it can't rely on human intuition as to what "makes sense".  We have to
>> have specific rules and guarantees and say that anything outside those
>> rules and guarantees is undefined.
>>
>> It sounds like you want an asm with an extra-strong ordering guarantee.
>> I think that would need to be an extension, since it would need to consider
>> cases where the asm is used in a function.  (Shades of carries_dependence
>> or whatever in the huge atomic thread.)  I think anything where:
>>
>>   void foo (void) { X; }
>>   void bar (void) { Y1; foo (); Y2; }
>>
>> has different semantics from:
>>
>>   void bar (void) { Y1; X; Y2; }
>>
>> is very dangerous.  And assuming that any function call could enable
>> or disable interrupts, and therefore that nothing can be moved across
>> a non-const function call, would limit things a bit too much.
>>
>> Thanks,
>> Richard
>>
>>
>
> I think the problem stems from "volatile" being a barrier to /data flow/
> changes,

What kind of /data flow/ changes?  It certainly isn't that currently,
only two volatiles always conflict but not a volatile and a non-volatile mem:

static int
true_dependence_1 (const_rtx mem, enum machine_mode mem_mode, rtx mem_addr,
   const_rtx x, rtx x_addr, bool mem_canonicalized)
{
...
  if (MEM_VOLATILE_P (x) && MEM_VOLATILE_P (mem))
return 1;

bool
refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p)
{
...
  /* Two volatile accesses always conflict.  */
  if (ref1->volatile_p
  && ref2->volatile_p)
return true;

> but what is needed in this case is a barrier to /control flow/
> changes.  To my knowledge, C does not provide any way of doing this, nor
> are there existing gcc extensions to guarantee the ordering.  But it
> certainly is the case that control flow ordering like this is important
> - it can be critical in embedded systems (such as in the example here by
> Georg-Johann), but it can also be important for non-embedded systems
> (such as to minimise the time spend while holding a lock).

Can you elaborate on this?  I have a hard time thinking of a
control flow transform that affects volatiles.

Richard.


> David
>
>


Re: [AVR] remove two maintainers

2014-03-03 Thread Joern Rennecke
As I am doing some work on avr, I would be available as an additional
maintainer, if you and the steering committee agree.


Re: Asm volatile causing performance regressions on ARM

2014-03-03 Thread David Brown
On 28/02/14 13:19, Richard Sandiford wrote:
> Georg-Johann Lay  writes:
>> Notice that in code1, func might contain such asm-pairs to implement
>> atomic operations, but moving costly_func across func does *not*
>> affect the interrupt respond times in such a disastrous way.
>>
>> Thus you must be *very* careful w.r.t. optimizing against asm volatile
>> + memory clobber.  It's too easy to miss some side effects of *real*
>> code.
> 
> I understand the example, but I don't think volatile asms guarantee
> what you want here.
> 
>> Optimizing code to scrap and pointing to some GCC internal reasoning or some 
>> standard's wording does not help with real code.
> 
> But how else can a compiler work?  It doesn't just regurgitate canned code,
> so it can't rely on human intuition as to what "makes sense".  We have to
> have specific rules and guarantees and say that anything outside those
> rules and guarantees is undefined.
> 
> It sounds like you want an asm with an extra-strong ordering guarantee.
> I think that would need to be an extension, since it would need to consider
> cases where the asm is used in a function.  (Shades of carries_dependence
> or whatever in the huge atomic thread.)  I think anything where:
> 
>   void foo (void) { X; }
>   void bar (void) { Y1; foo (); Y2; }
> 
> has different semantics from:
> 
>   void bar (void) { Y1; X; Y2; }
> 
> is very dangerous.  And assuming that any function call could enable
> or disable interrupts, and therefore that nothing can be moved across
> a non-const function call, would limit things a bit too much.
> 
> Thanks,
> Richard
> 
> 

I think the problem stems from "volatile" being a barrier to /data flow/
changes, but what is needed in this case is a barrier to /control flow/
changes.  To my knowledge, C does not provide any way of doing this, nor
are there existing gcc extensions to guarantee the ordering.  But it
certainly is the case that control flow ordering like this is important
- it can be critical in embedded systems (such as in the example here by
Georg-Johann), but it can also be important for non-embedded systems
(such as to minimise the time spend while holding a lock).

David




Re: X86_64 insns combination is not working well

2014-03-03 Thread Richard Biener
On Mon, Mar 3, 2014 at 9:40 AM, Jakub Jelinek  wrote:
> On Mon, Mar 03, 2014 at 11:02:14AM +0800, lin zuojian wrote:
>>I wrote a test code like this:
>> void foo(int * a)
>> {
>> a[0] = 0xfafafafb;
>> a[1] = 0xfafafafc;
>> a[2] = 0xfafafafe;
>> a[3] = 0xfafafaff;
>> a[4] = 0xfafafaf0;
>> a[5] = 0xfafafaf1;
>> a[6] = 0xfafafaf2;
>> a[7] = 0xfafafaf3;
>> a[8] = 0xfafafaf4;
>> a[9] = 0xfafafaf5;
>> a[10] = 0xfafafaf6;
>> a[11] = 0xfafafaf7;
>> a[12] = 0xfafafaf8;
>> a[13] = 0xfafafaf9;
>> a[14] = 0xfafafafa;
>> a[15] = 0xfafaf0fa;
>> }
>> that was what gcc generated:
>>   movl$-84215045, (%rdi)
>>   movl$-84215044, 4(%rdi)
>>   movl$-84215042, 8(%rdi)
>>   movl$-84215041, 12(%rdi)
>>   movl$-84215056, 16(%rdi)
>> ...
>> that was what LLVM/clang generated:
>>   movabsq $-361700855600448773, %rax # imm = 0xFAFAFAFCFAFAFAFB
>>   movq%rax, (%rdi)
>>   movabsq $-361700842715546882, %rax # imm = 0xFAFAFAFFFAFAFAFE
>>   movq%rax, 8(%rdi)
>>   movabsq $-361700902845089040, %rax # imm = 0xFAFAFAF1FAFAFAF0
>>   movq%rax, 16(%rdi)
>>   movabsq $-361700894255154446, %rax # imm = 0xFAFAFAF3FAFAFAF2
>> ...
>> I ran the code on my i7 machine for 100 times.Here was the result:
>> gcc:
>> real  0m50.613s
>> user  0m50.559s
>> sys   0m0.000s
>>
>> LLVM/clang:
>> real  0m32.036s
>> user  0m32.001s
>> sys   0m0.000s
>>
>> That mean movabsq did do a better job!
>> Should gcc peephole pass add such a combine?
>
> This sounds like PR22141, but a microbenchmark isn't the right thing
> to decide this.  From what I remember when playing with the patches,
> movabsq has been mostly bad for performance, at least on the CPUs I've tried
> it back then.  In addition to whether movabsq + movq compared to two movl
> is more beneficial, also alignment plays role here, say if this is in an
> inner loop and not aligned to 64-bits whether it won't slow things down too
> much.

Also the micro-benchmark may be best optimized by memset (, 0xfa, ) and
a set of byte stores?  Also interesting for optimizing this artificial testcase
for -Os ...

Looks like a candidate for collecting sth like

 static const init[] = { 0xfa };
 *ptr = init;

and deciding of an optimal expansion strategy with the help of target
specific code.  With a good implementation for that we could avoid
most constructor lowering in gimplification (at least most of the
middle-end passes happily lookup the constructor values from
inits above and accesses to sub-parts of *ptr).

Richard.

> Jakub


Re: [RFC] Meta-description for tree and gimple folding

2014-03-03 Thread Georg-Johann Lay

Am 02/27/2014 03:34 PM, schrieb Richard Biener:


I've been hacking on a prototype that generates matching and
simplification code from a meta-description.  The goal is
to provide a single source of transforms currently spread
over the compiler, mostly fold-const.c, gimple-fold.c and
tree-ssa-forwprop.c.  Another goal is to make these transforms
(which are most of the form of generating a simpler form of
a pattern-matched IL piece) more readily available to passes
like value-numbering so they can be used on-the-fly, using
information provided by the pass lattice.  The ultimate
goal is to generate (most of) fold-const.c and gimple-fold.c
and tree-ssa-forwprop.c from a single meta description.

Currently the prototype can generate code to match and simplify
on the GIMPLE IL and it uses a very simple description right now
(following the lispy style we have for machine descriptions).
For example

(define_match_and_simplify foo
   (PLUS_EXPR (MINUS_EXPR integral_op_p@0 @1) @1)
   @0)

Matches (A - B) + B and transforms it to A.  More complex
replacements involving modifying of matches operands can be
done with inlined C code:

(define_match_and_simplify bar
   (PLUS_EXPR INTEGER_CST_P@0 (PLUS_EXPR @1 INTEGER_CST_P@2))
   (PLUS_EXPR { int_const_binop (PLUS_EXPR, captures[0], captures[2]); } @1))

which matches CST1 + (X + CST2) and transforms it to (CST1 + CST2) + X
(thus it reassociates but it also simplifies the constant part).


Hi Richard,

in the past there were some bugs in folding that introduced undefined behaviour 
because they produced different signed overflow.


One example is PR56899 that folded

   (1 - X) * CST1

to

   X * (-CST1) + CST1

How is this expressed in the pattern. Is there something like a condition (like 
in insns)? Or will this be encoded in the operation?


Johann


Writing patterns will require a few new predicates like
INTEGER_CST_P or integral_op_p.

At this point I'll try integrating the result into a few
GIMPLE passes (forwprop and SCCVN) to see if the interface
works well enough.  Currently the GIMPLE interface is

tree
gimple_match_and_simplify (tree name, gimple_seq *seq,
   tree (*valueize)(tree));

where the simplification happens on the defining statement
of the SSA name name and a is_gimple_val result is returned.
Any intermediate stmts are appended to 'seq' (or NULL_TREE
is returned if that would be necessary and 'seq' is NULL)
and all SSA names matched and generated are valueized using
the valueize callback (if not NULL).  Thus for the first
example above we'd return A and do not touch seq while
for the second example we'd return a new temporary SSA
name and append name = CST' + X to seq (we might want
to allow in-place modification of the def stmt of name
as well, I'm not sure yet - that's the forwprop way of operation)

Patch below for reference.

Comments or suggestions?

Thanks,
Richard.




Re: [gsoc 2014] moving fold-const patterns to gimple

2014-03-03 Thread Richard Biener
On Sun, Mar 2, 2014 at 9:13 PM, Prathamesh Kulkarni
 wrote:
> Hi, I am an undergraduate student at University of Pune, India, and would
> like to work on moving folding patterns from fold-const.c to gimple.

I've seen the entry on our GSoC project page and edited it to discourage
people from working on that line.  See

http://gcc.gnu.org/ml/gcc/2014-02/msg00516.html

for why.  I think that open-coding the transforms isn't maintainable
in the long run.

> If I understand correctly, constant folding is done on GENERIC (by
> routines in fold-const.c), and then GENERIC is lowered to GIMPLE. The
> purpose of this project,
> is to have constant folding to be performed on GIMPLE instead (in
> gimple-fold.c?)
>
> I have a few elementary questions to ask:
>
> a) A contrived example:
> Consider a C expression, a = ~0 (assume a is int)
> In GENERIC, this would roughly be represented as:
> modify_expr>>
> this gets folded to:
> modify_expr
> and the corresponding gimple tuple generated is (-fdump-tree-gimple-raw):
> gimple_assign 
>
> So, instead of folding performed on GENERIC, it should be
> done on GIMPLE.
> So a tuple like the following should be generated by gimplification:
> 
> and folded to (by call to fold_stmt):
> 
> Is this the expected behavior ?
>
> I have attached a rough/incomplete patch (only stage1 compiled cc1), that
> does the following foldings on bit_not_expr:
> a) ~ INTEGER_CST => folded
> b) ~~x => x
> c) ~(-x) => x - 1
> (For the moment, I put case BIT_NOT_EXPR: return NULL_TREE
> in fold_unary_loc to avoid folding in GENERIC on bit_not_expr)
>
> Is the patch going in the correct direction ? Or have I completely missed
> the point here ? I would be grateful to receive suggestions, and start working
> on a fair patch.

I think you implement what was suggested by Kai (and previously
by me and Andrew, before I changed my mind).

Richard.

> On the following test-case:
> int main()
> {
>   int a, b, c;
>   a = b;
>   c = ~-a;
>   return 0;
> }
>
> The following GIMPLE is generated:
> main ()
> gimple_bind <
>   int D.1748;
>   int D.1749;
>   int D.1750;
>   int D.1751;
>   int D.1752;
>   int a;
>   int b;
>   int c;
>
>   gimple_assign 
>   gimple_assign 
>   gimple_assign 
>   gimple_assign 
>   gimple_return 
>>
>
> The patch generates two tuples for a = b,
> where only one is needed, and extra temporaries, which
> are not removed after the folding. How should I go about
> removing that (should I worry about that since subsequent passes,
> shall remove those ?)
>
> b) Some front-ends, C, for example, requires constant folding in certain 
> places,
> like case statement. If constant folding is completely moved off to gimple,
> how shall this be handled ? Shall we gimplify the expression immediately if
> it's required to be evaluated ?
>
> Thanks and Regards,
> Prathamesh


Re: Vim format in gcc source?

2014-03-03 Thread lin zuojian
Thx,Jonathan.
--
Regards
lin zuojian

On Mon, Mar 03, 2014 at 09:37:01AM +, Jonathan Wakely wrote:
> On 3 March 2014 07:00, lin zuojian wrote:
> > Hi guys,
> > How do I set the format of vim,so that my code doen't look alien?
> 
> Do you mean how do you set vim to match the GCC coding style?
> 
> It's not quite right, and it's mostly used for C++, but I use:
> 
> setl formatoptions=croql cindent cinoptions=:0,g0 
> comments=sr:/*,mb:*,el:*/,://
> setl cinoptions+=,{1s,>2s,n-1s
> setl noet


Re: Vim format in gcc source?

2014-03-03 Thread Jonathan Wakely
On 3 March 2014 07:00, lin zuojian wrote:
> Hi guys,
> How do I set the format of vim,so that my code doen't look alien?

Do you mean how do you set vim to match the GCC coding style?

It's not quite right, and it's mostly used for C++, but I use:

setl formatoptions=croql cindent cinoptions=:0,g0 comments=sr:/*,mb:*,el:*/,://
setl cinoptions+=,{1s,>2s,n-1s
setl noet


Re: Request for discussion: Rewrite of inline assembler docs

2014-03-03 Thread dw


On 2/27/2014 8:12 PM, Andi Kleen wrote:

dw  writes:

What would you say to something like this:

"Since GCC does not parse the asm, it has no visibility of any static
variables or functions it references.  This may result in those
symbols getting discarded by GCC as unused.  To avoid this problem,
list the symbols as inputs or outputs."

output makes no sense I think, only input.


For static functions, yes.  However won't static data have the same 
problem?  And static data could be input or output.



You still need the part about the top-level asm, where input
doesn't work.


Accessing variables from Basic asm has more problems than this. If you 
are inside a function that accesses globals from both asm and C, the 
results will probably be a mess.  That's why the current docs for Basic 
asm say:


"Safely accessing C data and calling functions from Basic |asm| is more 
complex than it may appear. To access C data, it is better to use 
Extended |asm|. "


However, you are right, more is needed.  How about:

"For asm blocks outside of functions (which must be Basic asm), be aware 
that since GCC does not parse the asm, it has no visibility of any 
static variables or functions it references.  This may result in those 
symbols getting discarded by GCC as unused."



And another common problem:

For top level asm there is no guarantee the compiler outputs the
statements in order.

Well, basic asm (which is the only thing you can use at top level)
already says:

"Do not expect a sequence of |asm| statements to remain perfectly
consecutive after compilation. To ensure that assembler instructions
maintain their order, use a single |asm| statement containing multiple
instructions. Note that GCC's optimizer can move |asm| statements
relative to other code, including across jumps. "

Is something more needed?

Yes it should be made clear that this applies to top-level asm
too.


I believe what you call "top level" the docs call "Basic asm." This same 
statement is in both sections.



-Andi




Re: X86_64 insns combination is not working well

2014-03-03 Thread Jakub Jelinek
On Mon, Mar 03, 2014 at 11:02:14AM +0800, lin zuojian wrote:
>I wrote a test code like this:
> void foo(int * a)
> {
> a[0] = 0xfafafafb;
> a[1] = 0xfafafafc;
> a[2] = 0xfafafafe;
> a[3] = 0xfafafaff;
> a[4] = 0xfafafaf0;
> a[5] = 0xfafafaf1;
> a[6] = 0xfafafaf2;
> a[7] = 0xfafafaf3;
> a[8] = 0xfafafaf4;
> a[9] = 0xfafafaf5;
> a[10] = 0xfafafaf6;
> a[11] = 0xfafafaf7;
> a[12] = 0xfafafaf8;
> a[13] = 0xfafafaf9;
> a[14] = 0xfafafafa;
> a[15] = 0xfafaf0fa;
> }
> that was what gcc generated:
>   movl$-84215045, (%rdi)
>   movl$-84215044, 4(%rdi)
>   movl$-84215042, 8(%rdi)
>   movl$-84215041, 12(%rdi)
>   movl$-84215056, 16(%rdi)
> ...
> that was what LLVM/clang generated:
>   movabsq $-361700855600448773, %rax # imm = 0xFAFAFAFCFAFAFAFB
>   movq%rax, (%rdi)
>   movabsq $-361700842715546882, %rax # imm = 0xFAFAFAFFFAFAFAFE
>   movq%rax, 8(%rdi)
>   movabsq $-361700902845089040, %rax # imm = 0xFAFAFAF1FAFAFAF0
>   movq%rax, 16(%rdi)
>   movabsq $-361700894255154446, %rax # imm = 0xFAFAFAF3FAFAFAF2
> ...
> I ran the code on my i7 machine for 100 times.Here was the result:
> gcc:
> real  0m50.613s
> user  0m50.559s
> sys   0m0.000s
> 
> LLVM/clang:
> real  0m32.036s
> user  0m32.001s
> sys   0m0.000s
> 
> That mean movabsq did do a better job!
> Should gcc peephole pass add such a combine?

This sounds like PR22141, but a microbenchmark isn't the right thing
to decide this.  From what I remember when playing with the patches,
movabsq has been mostly bad for performance, at least on the CPUs I've tried
it back then.  In addition to whether movabsq + movq compared to two movl
is more beneficial, also alignment plays role here, say if this is in an
inner loop and not aligned to 64-bits whether it won't slow things down too
much.

Jakub