How to control GCC builtin functions optimization

2019-01-10 Thread Cao jin
Hi,
(pls CC me when replying because I am not subscriber)

I met an interesting phenomenon when looking into linux kernel
compilation, it can be simply summarized as following: in
arch/x86/boot/compressed, memcpy is defined as __builtin_memcpy, while
also implemented as a function. But when using memcpy, in some case GCC
optimize it to inline code, in other case GCC just emit a call to
self-defined memcpy function. This can be confirmed according to the
symbol table via `nm bluh.o`.

The compiling flags is, for example:
cmd_arch/x86/boot/compressed/pgtable_64.o := gcc
-Wp,-MD,arch/x86/boot/compressed/.pgtable_64.o.d  -nostdinc -isystem
/usr/lib/gcc/x86_64-redhat-linux/8/include -I./arch/x86/include
-I./arch/x86/include/generated  -I./include
-I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include
./include/linux/kconfig.h -include ./include/linux/compiler_types.h
-D__KERNEL__ -DCONFIG_CC_STACKPROTECTOR -m64 -O2 -fno-strict-aliasing
-fPIE -DDISABLE_BRANCH_PROFILING -mcmodel=small -mno-mmx -mno-sse
-ffreestanding -fno-stack-protector-DKBUILD_BASENAME='"pgtable_64"'
-DKBUILD_MODNAME='"pgtable_64"' -c -o
arch/x86/boot/compressed/pgtable_64.o arch/x86/boot/compressed/pgtable_64.c

Now the questions is: from code-reading, it is kind of non-intuitive, is
there any explicit way to control the optimization behavior accurately?
-- 
Sincerely,
Cao jin




gcc-7-20190110 is now available

2019-01-10 Thread gccadmin
Snapshot gcc-7-20190110 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/7-20190110/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 7 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch 
revision 267824

You'll find:

 gcc-7-20190110.tar.xzComplete GCC

  SHA256=de374f99d3d81bc6a7d12388ce4800aea87c33a117892b0b99f64dd8cd650285
  SHA1=540b36ba22289ff9b1e5cbcff934062a0b477df2

Diffs from 7-20190103 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-7
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: __has_include__ is problematic

2019-01-10 Thread Jakub Jelinek
On Thu, Jan 10, 2019 at 03:35:14PM +0100, Florian Weimer wrote:
> * Jakub Jelinek:
> 
> > On Thu, Jan 10, 2019 at 03:20:59PM +0100, Florian Weimer wrote:
> >> Can we remove __has_include__?
> >
> > No.
> >
> >> Its availability results in code which is needlessly non-portable
> >> because for some reason, people write __has_include__ instead of
> >> __has_include.  (I don't think there is any difference.)
> >
> > __has_include needs to be a macro, while __has_include__ is a weirdo
> > builtin that does all the magic.  But one needs to be able to
> > #ifdef __has_include
> > etc.
> 
> Why doesn't a synthetic
> 
> #define __has_include __has_include
> 
> work?

Because the magic builtin is a preprocessor builtin, kind of macro,
so you can't have a normal macro with the same name.

Jakub


Re: __has_include__ is problematic

2019-01-10 Thread Florian Weimer
* Jakub Jelinek:

> On Thu, Jan 10, 2019 at 03:20:59PM +0100, Florian Weimer wrote:
>> Can we remove __has_include__?
>
> No.
>
>> Its availability results in code which is needlessly non-portable
>> because for some reason, people write __has_include__ instead of
>> __has_include.  (I don't think there is any difference.)
>
> __has_include needs to be a macro, while __has_include__ is a weirdo
> builtin that does all the magic.  But one needs to be able to
> #ifdef __has_include
> etc.

Why doesn't a synthetic

#define __has_include __has_include

work?

Thanks,
Florian


Re: __has_include__ is problematic

2019-01-10 Thread Jakub Jelinek
On Thu, Jan 10, 2019 at 03:20:59PM +0100, Florian Weimer wrote:
> Can we remove __has_include__?

No.

> Its availability results in code which is needlessly non-portable
> because for some reason, people write __has_include__ instead of
> __has_include.  (I don't think there is any difference.)

__has_include needs to be a macro, while __has_include__ is a weirdo
builtin that does all the magic.  But one needs to be able to
#ifdef __has_include
etc.

Jakub


__has_include__ is problematic

2019-01-10 Thread Florian Weimer
Can we remove __has_include__?

Its availability results in code which is needlessly non-portable
because for some reason, people write __has_include__ instead of
__has_include.  (I don't think there is any difference.)

Thanks,
Florian


Some beginner's questions concerning doing instrumentation/analysis in GIMPLE

2019-01-10 Thread Carter Cheng
Hello,

I am trying to assess an idea and whether it is possible to implement a
certain idea as a gcc plugin. I look over some of the information on the
web and the gcc internals documentation but I cannot still figure out some
basic things concerning manipulating GIMPLE in a plugin.

1) How does one add and safely remove basic blocks in GIMPLE if one is
trying to transform a function?

2) How does one label a basic block with a new label (for conditional
branches)?

3) How does one do the same for functions (like in situations when one is
doing interprocedural analysis and function cloning)?

I apologize if this is in a tutorial somewhere but I could not find it.

Regards,

Carter.


Re: autovectorization in gcc

2019-01-10 Thread Jonathan Wakely
On Thu, 10 Jan 2019 at 09:25, Kay F. Jahnke wrote:
> Documentation is absolutely essential. If there is lots of development
> in autovectorization, not documenting this work in a way users can
> simply find is - in my eyes - a grave omission. The text
> 'Auto-vectorization in GCC' looks like it has last been updated in 2011
> (according to the 'Latest News' section). I'm curious to know what new
> capabilities have been added since then.

The page you're looking at documents the project to *add*
autovectorization to GCC. That project was completed many years ago,
and the feature has been present in GCC for years.

I'm not disputing that there could be better documentation, but that
page is not the place to find it. That page should probably get a
notice added saying that the project is complete and that the page is
now only of historical interest.


Re: autovectorization in gcc

2019-01-10 Thread Szabolcs Nagy
On 10/01/2019 08:19, Richard Biener wrote:
> On Wed, 9 Jan 2019, Jakub Jelinek wrote:
> 
>> On Wed, Jan 09, 2019 at 11:10:25AM -0500, David Malcolm wrote:
>>> extern void vf1()
>>> {
>>>#pragma vectorize enable
>>>for ( int i = 0 ; i < 32768 ; i++ )
>>>  data [ i ] = std::sqrt ( data [ i ] ) ;
>>> }
>>>
>>> Compiling on this x86_64 box with -fopt-info-vec-missed shows the
>>
>>>   _7 = .SQRT (_1);
>>>   if (_1 u>= 0.0)
>>> goto ; [99.95%]
>>>   else
>>> goto ; [0.05%]
>>>
>>>[local count: 1062472912]:
>>>   goto ; [100.00%]
>>>
>>>[local count: 531495]:
>>>   __builtin_sqrtf (_1);
>>>
>>> I'm not sure where that control flow came from: it isn't in
>>>   sqrt-test.cc.104t.stdarg
>>> but is in
>>>   sqrt-test.cc.105t.cdce
>>> so I think it's coming from the argument-range code in cdce.
>>>
>>> Arguably the location on the statement is wrong: it's on the loop
>>> header, when it presumably should be on the std::sqrt call.
>>
>> See my either mail, it is the result of the -fmath-errno default,
>> the inline emitted sqrt doesn't handle errno setting and we emit
>> essentially x = sqrt (arg); if (__builtin_expect (arg < 0.0, 0)) sqrt (arg); 
>> where
>> the former sqrt is inline using HW instructions and the latter is the
>> library call.
>>
>> With some extra work we could vectorize it; e.g. if we make it handle
>> OpenMP #pragma omp ordered simd efficiently, it would be the same thing
>> - allow non-vectorizable portions of vectorized loops by doing there a
>> scalar loop from 0 to vf-1 doing the non-vectorizable stuff + drop the 
>> limitation
>> that the vectorized loop is a single bb.  Essentially, in this case it would
>> be
>>   vec1 = vec_load (data + i);
>>   vec2 = vec_sqrt (vec1);
>>   if (__builtin_expect (any (vec2 < 0.0)))
>> {
>>   for (int i = 0; i < vf; i++)
>> sqrt (vec2[i]);
>> }
>>   vec_store (data + i, vec2);
>> If that would turn to be way too hard, we could for the vectorization
>> purposes hide that into the .SQRT internal fn, say add a fndecl argument to
>> it if it should treat the exceptional cases some way so that the control
>> flow isn't visible in the vectorized loop.
> 
> If we decide it's worth the trouble I'd rather do that in the epilogue
> and thus make the any (vec2 < 0.0) a reduction.  Like
> 
>smallest = min(smallest, vec1);
> 
> and after the loop do the errno thing on the smallest element.
> 
> That said, this is a transform that is probably worthwhile even
> on scalar code, possibly easiest to code-gen right from the start
> in the call-dce pass.

if this is useful other than errno handling then fine,
but i think it's a really bad idea to add optimization
complexity because of errno handling: nobody checks
errno after sqrt (other than conformance test code).

-fno-math-errno is almost surely closer to what the user
wants than trying to vectorize the errno handling.


Re: Improve syntax error

2019-01-10 Thread Segher Boessenkool
On Sat, Jan 05, 2019 at 06:02:08PM +0100, Daniel Marjamäki wrote:
> > I think the indentation warnings should catch that?
> 
> I get this:
> 
> void f()
> {
>   }
> } // <- error: expected identifier or '(' before '}' token
> 
> I ran with -Wall -Wextra -pedantic and did not see a indentation
> warning. Am I missing some indentation warning? The error message I
> get is a little misplaced. I think it's fine to warn about that } but
> it could also say in the error message that the problem is probably
> the previous }

I opened https://gcc.gnu.org/PR88790 .

> 
> > Should this say something like "expected ) or , or ;"?
> 
> No none of those suggestions will solve the error.
> 
> Look at this code:
> 
> int x = 3) + 0;
> 
> Writing a ) or , or ; will not fix the syntax error. You have to
> remove the ) or add a ( somewhere.

Yeah, I wasn't quite awake when I wrote that, apparently :-)


Segher


Re: autovectorization in gcc

2019-01-10 Thread Kay F. Jahnke

On 09.01.19 10:50, Andrew Haley wrote:

On 1/9/19 9:45 AM, Kyrill Tkachov wrote:

There's plenty of work being done on auto-vectorisation in GCC.
Auto-vectorisation is a performance optimisation and as such is not really
a user-visible feature that absolutely requires user documentation.


I don't agree. Sometimes vectorization is critical. It would be nice
to have a warning which would fire if vectorization failed. That would
surely help the OP. 


Further down this thread, some g++ flags were used which produced 
meaningful information about vectorization failures, so the facility is 
there - maybe it's not very prominent.


When it comes to user visibility, I'd like to add that there are great 
differences between different users. I spend most of my time writing 
library code, using template metaprogramming in C++. It's essential for 
my code to perform well (real-time visualization), but I don't have 
intimate compiler knowledge - I'm aiming at writing portable, 
standard-compliant code. I'd like the compilers I use to provide 
extensive documentation if I need to track down a problem, and I dislike 
it if I have to use 'special' commands to get things done. Other users 
may produce target-specific code with one specific compiler, and they 
have different needs. It's better to have documentation and not need it 
than the other way round.


So my idea of a 'contract' regarding vectorization is like this:

- the documentation states the scope of vectorization
- the use of a feature can be forced or disallowed
- or left up to a cost model
- the compiler can be made to produce diagnostic output

Documentation is absolutely essential. If there is lots of development 
in autovectorization, not documenting this work in a way users can 
simply find is - in my eyes - a grave omission. The text 
'Auto-vectorization in GCC' looks like it has last been updated in 2011 
(according to the 'Latest News' section). I'm curious to know what new 
capabilities have been added since then. It makes my life much easier if 
I can write loops to follow a given pattern relying on the 
autovectorizer, rather than having to use explicit vector code, having 
to rely on a library. There is also another aspect to being dependent on 
external libraries. When a new architecture comes around, chances are 
the compiler writers will be first to support it. It may take years for 
an external library to add a new target ISA, more time until this runs 
smoothly, and then more time until it has trickled down to the package 
repos of most distributions - if this happens at all. Plus you have the 
danger of betting on the wrong horse, and when the very promising 
library you've used to code your stuff goes offline or commercial, 
you've wasted your precious time. Relying only on the compiler brings 
innovation out most reliably and quickly, and is a good strategy to 
avoid wasting resources.


Now I may be missing things here because I haven't dug deeply enough to 
find documentation about autovectorization in gcc. This was why I have 
asked to be pointed to 'where the action is'. I was hoping to maybe get 
some helpful hints. My main objective is, after all, to 'deliberately 
exploit the compiler's autovectorization facilities by feeding data in
autovectorization-friendly loops'. The code will run, vectorized or not, 
but it would be great to have good guidelines what will or will not be 
autovectorized with a given compiler, rather than having to look at the 
assembler output.


Kay






Re: autovectorization in gcc

2019-01-10 Thread Richard Biener
On Wed, 9 Jan 2019, Jakub Jelinek wrote:

> On Wed, Jan 09, 2019 at 11:10:25AM -0500, David Malcolm wrote:
> > extern void vf1()
> > {
> >#pragma vectorize enable
> >for ( int i = 0 ; i < 32768 ; i++ )
> >  data [ i ] = std::sqrt ( data [ i ] ) ;
> > }
> > 
> > Compiling on this x86_64 box with -fopt-info-vec-missed shows the
> 
> >   _7 = .SQRT (_1);
> >   if (_1 u>= 0.0)
> > goto ; [99.95%]
> >   else
> > goto ; [0.05%]
> > 
> >[local count: 1062472912]:
> >   goto ; [100.00%]
> > 
> >[local count: 531495]:
> >   __builtin_sqrtf (_1);
> > 
> > I'm not sure where that control flow came from: it isn't in
> >   sqrt-test.cc.104t.stdarg
> > but is in
> >   sqrt-test.cc.105t.cdce
> > so I think it's coming from the argument-range code in cdce.
> > 
> > Arguably the location on the statement is wrong: it's on the loop
> > header, when it presumably should be on the std::sqrt call.
> 
> See my either mail, it is the result of the -fmath-errno default,
> the inline emitted sqrt doesn't handle errno setting and we emit
> essentially x = sqrt (arg); if (__builtin_expect (arg < 0.0, 0)) sqrt (arg); 
> where
> the former sqrt is inline using HW instructions and the latter is the
> library call.
> 
> With some extra work we could vectorize it; e.g. if we make it handle
> OpenMP #pragma omp ordered simd efficiently, it would be the same thing
> - allow non-vectorizable portions of vectorized loops by doing there a
> scalar loop from 0 to vf-1 doing the non-vectorizable stuff + drop the 
> limitation
> that the vectorized loop is a single bb.  Essentially, in this case it would
> be
>   vec1 = vec_load (data + i);
>   vec2 = vec_sqrt (vec1);
>   if (__builtin_expect (any (vec2 < 0.0)))
> {
>   for (int i = 0; i < vf; i++)
> sqrt (vec2[i]);
> }
>   vec_store (data + i, vec2);
> If that would turn to be way too hard, we could for the vectorization
> purposes hide that into the .SQRT internal fn, say add a fndecl argument to
> it if it should treat the exceptional cases some way so that the control
> flow isn't visible in the vectorized loop.

If we decide it's worth the trouble I'd rather do that in the epilogue
and thus make the any (vec2 < 0.0) a reduction.  Like

   smallest = min(smallest, vec1);

and after the loop do the errno thing on the smallest element.

That said, this is a transform that is probably worthwhile even
on scalar code, possibly easiest to code-gen right from the start
in the call-dce pass.

Richard.