[Bug target/108866] Allow to pass Windows resource file (.rc) as input to gcc

2024-03-24 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108866

--- Comment #9 from Pali Rohár  ---
(In reply to peter0x44 from comment #7)
> 5) windres --help has this list of "supported targets":
> x86_64-w64-mingw32-windres: supported targets: pe-x86-64 pei-x86-64
> pe-bigobj-x86-64 elf64-x86-64 pe-i386 pei-i386 elf32-i386 elf32-iamcu
> elf64-little elf64-big elf32-little elf32-big srec symbolsrec verilog tekhex
> binary ihex plugin

I reported this particular issue into the binutils bugzilla:
https://sourceware.org/bugzilla/show_bug.cgi?id=31543

[Bug middle-end/114449] bswap64 not optimized

2024-03-24 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114449

--- Comment #3 from Pali Rohár  ---
Note that clang optimizes it just with -O2 and does not require any special
pragma.

[Bug middle-end/114449] bswap64 not optimized

2024-03-24 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114449

--- Comment #2 from Pali Rohár  ---
Interesting... I was expecting that some -O3 or better -Ofast option tells gcc
to optimize the code as much as possible.

I added that pragma before for-loop in the first example and then gcc really
optimized the code to just bswap instruction.

[Bug middle-end/114449] New: bswap64 not optimized

2024-03-24 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114449

Bug ID: 114449
   Summary: bswap64 not optimized
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pali at kernel dot org
  Target Milestone: ---

https://godbolt.org/z/dc3br9dYT

gcc 13.2 with -O3 does not detect straightforward code for bswap64
functionality. It generates unoptimized code.

uint64_t bswap64_1(uint64_t num) {
uint64_t ret = 0;
for (size_t i = 0; i < sizeof(num); i++) {
ret |= ((num >> (8*(sizeof(num)-1-i))) & 0xff) << (8*i);
}
return ret;
}


Rewriting the code to manually unpack the loop cause that gcc produces
optimized code with single "bswap" instruction on x86-64.

uint64_t bswap64_2(uint64_t num) {
uint64_t ret = 0;
ret |= (((num >> 56) & 0xff) <<  0);
ret |= (((num >> 48) & 0xff) <<  8);
ret |= (((num >> 40) & 0xff) << 16);
ret |= (((num >> 32) & 0xff) << 24);
ret |= (((num >> 24) & 0xff) << 32);
ret |= (((num >> 16) & 0xff) << 40);
ret |= (((num >>  8) & 0xff) << 48);
ret |= (((num >>  0) & 0xff) << 56);
return ret;
}


Additional -funroll-all-loops argument for the first example does not help and
still produces unoptimized code.

[Bug middle-end/114448] New: Roundup not optimized

2024-03-24 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114448

Bug ID: 114448
   Summary: Roundup not optimized
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pali at kernel dot org
  Target Milestone: ---

https://godbolt.org/z/4fPKGzs1M

Straightforward code which round up unsigned number to the next multiply of 4
is:

(num % 4 == 0) ? num : num + (4 - num % 4);

gcc -O2 generates:

mov edx, edi
mov eax, edi
and edx, -4
add edx, 4
testdil, 3
cmovne  eax, edx
ret


This is not optimal and branch/test can be avoided by using double modulo:

num + (4 - num % 4) % 4;

for which gcc -O2 generates:

mov eax, edi
neg eax
and eax, 3
add eax, edi
ret


Optimal implementation for round up 4 is using bithacks:

(num + 3) & ~3;

for which gcc -O2 generates:

lea eax, [rdi+3]
and eax, -4
ret

[Bug target/108866] Allow to pass Windows resource file (.rc) as input to gcc

2024-03-15 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108866

--- Comment #8 from Pali Rohár  ---
Thank you for input, as you already figured out there is lot of work for this.
And I think I'm not skilled enough to implement everything properly, so I would
have to let this to gcc developers. I will answer questions:

> 1) should gcc pass through any arguments to windres?
>  -I --include-dir=   Include directory when preprocessing rc file
>  -D --define [=]Define SYM when preprocessing rc file
>  -U --undefine   Undefine SYM when preprocessing rc file

windres's -I -D and -U are used for when processing "rc" file. So yes, gcc
should propagate -I -D and -U to windres for "rc" files (but not to "res"
files).

> 2) does -m32 or -m64 need handling in any specific ways?

This is really a good question and I totally forgot about this. Because gcc's
-m32 generates coff for different arch that gcc's -m64, it means that -m32/-m64
switches has to be propagated to windres. I think that gcc's -m32 and -m64
should be "converted" to windres's --target= option (with the correct
argument).

> 3) the linker has -Wl, for passing arguments to it, does windres need an 
> equivalent?

I think that it is not needed at all because all windres's flags should already
have some options in gcc.

> 4) windres --help says:
> FORMAT is one of rc, res, or coff, and is deduced from the file name
> should ".res" be handled too?

"rc" is text resource format, "res" is the binary resource format. "coff" is
PE/COFF object file format with binary resource.

windres has option -J which explicitly sets the input format (and then
extension is not used for deduction).

So I think that gcc driver should have rules for both text (rc) and binary
(res) formats. And in my "test.spec" experiment are rules for both formats.


> 5) windres --help has this list of "supported targets":
> x86_64-w64-mingw32-windres: supported targets: pe-x86-64 pei-x86-64 
> pe-bigobj-x86-64 elf64-x86-64 pe-i386 pei-i386 elf32-i386 elf32-iamcu 
> elf64-little elf64-big elf32-little elf32-big srec symbolsrec verilog tekhex 
> binary ihex plugin
> 
> Do they matter? I did not expect to see any "elf" on this list, because 
> windows obviously doesn't use it.

This is for sure bug. ELF does not support embedding windows resource files.
Windows resources can be embedded only into PE/COFF image file or into PE/COFF
object file.

And AFAIK, windres supports parsing both PE/COFF image and object files, but
can generate only PE/COFF object file.

So windres target list for sure contains non-senses and that is also reason why
you got those errors when you specified ELF.

> 6) does llvm-windres need to be considered at all? should there be a way to 
> select it? an -fuse-rc= command option or so?

GNU windres is part of the binutils, where is also GNU as. So if the gcc is
using GNU as from binutils for assembling then it should use also GNU windres
from binutils for processing resources.

So in my own opinion, usage of "windres" from gcc should be handled in the same
way as usage of "as" from gcc. If gcc has a way to specify its own as binary,
then it makes sense to allow specifying its own windres binary.

But gcc developers can have different opinion.

[Bug target/108866] Allow to pass Windows resource file (.rc) as input to gcc

2024-03-14 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108866

--- Comment #5 from Pali Rohár  ---
Thank you for info, I read that blog post and based on those details I adjusted
spec file

$ x86_64-w64-mingw32-gcc -dumpspecs > test.spec

by adding additional lines to test.spec:

.rc:
x86_64-w64-mingw32-windres -J rc -O coff -i %i %{c:%W{o*}%{!o*:-o
%w%b%O}}%{!c:-o %d%w%u%O}

.res:
x86_64-w64-mingw32-windres -J res -O coff -i %i %{c:%W{o*}%{!o*:-o
%w%b%O}}%{!c:-o %d%w%u%O}


rc files contains resources in text format and res files in binary format.

With these changes x86_64-w64-mingw32-gcc was able to take both .c and .rc file
on the input and produce .exe file with resource.

$ cat test.c
int main() { return 0; }

$ cat test.rc
1 VERSIONINFO
BEGIN
END

$ x86_64-w64-mingw32-gcc -specs=test.spec test.c test.rc -o test.exe


Now show resource stored in test.exe:

$ x86_64-w64-mingw32-windres -O rc test.exe /dev/stdout

/* Type: version

   Name: 1.  */
LANGUAGE 9, 1

1 VERSIONINFO
BEGIN
END


Also replacing text test.rc file by binary test.res file works.


There is one problem with it. I had to "hardcode" x86_64-w64-mingw32-windres
name instead of just "windres". How to declare cross compile prefix? Because
gcc somehow for "as" automatically adds it as in spec file is just "as", not
"x86_64-w64-mingw32-as".

[Bug target/109317] -Os generates bigger code than -O2 on 32-bit ARM

2024-03-13 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109317

--- Comment #3 from Pali Rohár  ---
Do you need some more input or test data about this issue?

[Bug target/108866] Allow to pass Windows resource file (.rc) as input to gcc

2024-03-13 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108866

--- Comment #2 from Pali Rohár  ---
Andrew, I do not know what is gcc driver nor what to do for it. But if you can
show me some pointers, I can try it.

Or if you need more details about files, usage, etc... please let me know.

[Bug target/108849] __declspec(code_seg("segname")) does not work

2024-03-13 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108849

--- Comment #3 from Pali Rohár  ---
Arsen, so based on my understooding (please correct me if I'm wrong), gcc's
"section" can be used on both code (functions) and data (global variables). And
ms's "code_seg" can be used only on code (functions).

So if gcc adds __declspec(code_seg("segname")) as alias to
__declspec(section("segname")) for TARGET_DECLSPEC then it should be OK for
valid source code. However it does not throws an compile error if
__declspec(code_seg("segname")) is specified on data. But I think it is
acceptable. Primary motivation is support for compiling valid source code.

Are you able to add this alias?

[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86

2024-03-13 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319

--- Comment #8 from Pali Rohár  ---
Thanks for quick response and fixup of this issue.

[Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86

2024-03-12 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319

--- Comment #3 from Pali Rohár  ---
For details, here is the compiler which produces the mentioned code:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 12.2.0-14'
--with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Debian 12.2.0-14)

I guess that with these configure options you should be able to compile gcc
which produces the mentioned code.

[Bug target/114319] New: htobe64-like function is not optimized on 32-bit x86

2024-03-12 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319

Bug ID: 114319
   Summary: htobe64-like function is not optimized on 32-bit x86
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pali at kernel dot org
  Target Milestone: ---
Target: x86

Here is very simple and straightforward implementation of htobe64 function
which takes 64-bit number stored in unsigned long long variable and encodes it
into byte buffer unsigned char[].

void test1(unsigned long long val, unsigned char *buf) {
  buf[0] = val >> 56;
  buf[1] = val >> 48;
  buf[2] = val >> 40;
  buf[3] = val >> 32;
  buf[4] = val >> 24;
  buf[5] = val >> 16;
  buf[6] = val >> 8;
  buf[7] = val;
}

Compiling it for 64-bit x86 via "gcc -m64 -O2" produces optimized code:

 :
   0:   48 0f cfbswap  %rdi
   3:   48 89 3emov%rdi,(%rsi)
   6:   c3  retq

But compiling it for 32-bit x86 via "gcc -m32 -O2" produces not so optimized
code:

 :
   0:   8b 54 24 08 mov0x8(%esp),%edx
   4:   8b 44 24 0c mov0xc(%esp),%eax
   8:   89 d1   mov%edx,%ecx
   a:   88 70 02mov%dh,0x2(%eax)
   d:   c1 e9 18shr$0x18,%ecx
  10:   88 50 03mov%dl,0x3(%eax)
  13:   88 08   mov%cl,(%eax)
  15:   89 d1   mov%edx,%ecx
  17:   8b 54 24 04 mov0x4(%esp),%edx
  1b:   c1 e9 10shr$0x10,%ecx
  1e:   0f ca   bswap  %edx
  20:   88 48 01mov%cl,0x1(%eax)
  23:   89 50 04mov%edx,0x4(%eax)
  26:   c3  ret


I tried to compile it for 32-bit powerpc via "powerpc-linux-gnu-gcc -m32 -O2"
and it produces optimized code:

 :
   0:   90 65 00 00 stw r3,0(r5)
   4:   90 85 00 04 stw r4,4(r5)
   8:   4e 80 00 20 blr

Same for 64-bit powerpc via "powerpc-linux-gnu-gcc -m64 -O2":

 <.test1>:
   0:   f8 64 00 00 std r3,0(r4)
   4:   4e 80 00 20 blr


As a next experiment I tried to rewrite the simple implementation to use gcc
builtins.

void test2(unsigned long long val, unsigned char *buf) {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
  val = __builtin_bswap64(val);
#endif
  __builtin_memcpy(buf, , sizeof(val));
}

If I compile it for 32-bit x86 then I get optimized code:

0030 :
  30:   8b 4c 24 0c mov0xc(%esp),%ecx
  34:   8b 44 24 04 mov0x4(%esp),%eax
  38:   8b 54 24 08 mov0x8(%esp),%edx
  3c:   0f c8   bswap  %eax
  3e:   89 41 04mov%eax,0x4(%ecx)
  41:   0f ca   bswap  %edx
  43:   89 11   mov%edx,(%ecx)
  45:   c3  ret

If I compile it for 64-bit x86 then I get exactly same code as for test1:

0010 :
  10:   48 0f cfbswap  %rdi
  13:   48 89 3emov%rdi,(%rsi)
  16:   c3  retq

I tried to compile it for powerpc too and the result of test1 and test2 was
same.



So it looks like that the issue here is specific for 32-bit x86 and gcc does
not detect that test1 function on x86 is doing bswap64.

All tests I have done on (amd64) Debian gcc and for powerpc target I used
Debian's powerpc-linux-gnu-gcc cross compiler.

[Bug target/108849] __declspec(code_seg("segname")) does not work

2024-01-07 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108849

--- Comment #2 from Pali Rohár  ---
`section` is the best option. MS says about it:

https://learn.microsoft.com/en-us/cpp/cpp/code-seg-declspec

> The code_seg declaration attribute names an executable text segment in the 
> .obj file in which the object code for the function or class member functions 
> is stored.

> A segment is a named block of data in an .obj file that is loaded into memory 
> as a unit. A text segment is a segment that contains executable code. The 
> term section is often used interchangeably with segment.

> By default, when no code_seg is specified, object code is put in a segment 
> named .text.

[Bug target/108851] gcc -pie generates unwanted PE export table

2023-09-30 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108851

Pali Rohár  changed:

   What|Removed |Added

   See Also|https://sourceware.org/bugz |https://sourceware.org/bugz
   |illa/show_bug.cgi?id=30004  |illa/show_bug.cgi?id=30922

--- Comment #4 from Pali Rohár  ---
No response here, so I reported it to binutils bugtracker:
https://sourceware.org/bugzilla/show_bug.cgi?id=30922

[Bug target/108853] Add new -mcpu=e500 alias for -mcpu=8540

2023-05-08 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108853

--- Comment #5 from Pali Rohár  ---
Back to the original question. Can gcc add a new option -mcpu=e500 as alias to
-mcpu=8540 ?

[Bug target/108851] gcc -pie generates unwanted PE export table

2023-04-25 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108851

--- Comment #3 from Pali Rohár  ---
Or do you have any other suggestions?

[Bug target/109317] -Os generates bigger code than -O2 on 32-bit ARM

2023-04-22 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109317

--- Comment #2 from Pali Rohár  ---
Any idea what can be done with this?

[Bug lto/109369] LTO drops explicitly referenced symbol _pei386_runtime_relocator

2023-04-13 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369

--- Comment #10 from Pali Rohár  ---
> I would suggest to move the bug to the Binutils Bugzilla.

Done: https://sourceware.org/bugzilla/show_bug.cgi?id=30343

[Bug lto/109369] LTO drops explicitly referenced symbol _pei386_runtime_relocator

2023-04-11 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369

--- Comment #8 from Pali Rohár  ---
So from the discussion, do I understand correctly that this is rather LD linker
issue?

[Bug target/108851] gcc -pie generates unwanted PE export table

2023-04-07 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108851

--- Comment #2 from Pali Rohár  ---
So should I report this issue to binutils bugtracker then?

[Bug lto/109369] LTO drops explicitly referenced symbol _pei386_runtime_relocator

2023-04-01 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369

--- Comment #4 from Pali Rohár  ---
I wanted to point that marking _pei386_runtime_relocator() function with
__attribute__((used)) is working fine.

And whether _pei386_runtime_relocator() should participate in LTO at all? I
would rather ask, why not? Is there any specific reason why
_pei386_runtime_relocator() should not be compiled with LTO? I would expect
from gcc/ld that whole application can be compiled with LTO.

[Bug lto/109368] LTO drops entry point symbol

2023-04-01 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109368

--- Comment #4 from Pali Rohár  ---
Reported to binutils: https://sourceware.org/bugzilla/show_bug.cgi?id=30300

[Bug lto/109368] LTO drops entry point symbol

2023-04-01 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109368

--- Comment #2 from Pali Rohár  ---
I do not know. The issue happens when LTO is enabled for GCC.

[Bug lto/109369] New: LTO drops explicitly referenced symbol _pei386_runtime_relocator

2023-04-01 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369

Bug ID: 109369
   Summary: LTO drops explicitly referenced symbol
_pei386_runtime_relocator
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pali at kernel dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---
Target: Mingw32

When PE runtime-pseudo-reloc is used (e.g. referencing member of global array
from DLL library without being marked as dllimport), LTO drops
_pei386_runtime_relocator symbol even when it is explicitly referenced from
used symbol and then it complains that _pei386_runtime_relocator symbol was
dropped.

This is a bug because LTO compiler 1) should not drop any symbol which is
explicitly referenced from some used symbol and 2) should not drop special
_pei386_runtime_relocator symbol when it detected that PE runtime-pseudo-reloc
is used.



Test case:

Create simple DLL library with global array arr[]:

$ cat arr.c
int arr[2] = { 1, 2 };

$ i686-w64-mingw32-gcc -shared arr.c -o arr.dll



Define simple startup file for mingw (so to compile full test case without
mingw). Function _pei386_runtime_relocator() is explicitly referenced from the
startup function mainCRTStartup():

$ cat startup.c
extern void _pei386_runtime_relocator(void);
extern int main();

int __main() { }

__attribute__((force_align_arg_pointer))
__attribute__((noinline))
static int _mainCRTStartup(void) {
  _pei386_runtime_relocator();
  return main();
}

__attribute__((used)) /* required due to bug 109368 */
int mainCRTStartup(void) {
  return _mainCRTStartup();
}



Implement PE runtime-pseudo-reloc. For compile-only purposes (without runtime
tests) it can be empty:

$ cat pseudo-reloc.c
void _pei386_runtime_relocator(void) { }



And finally simple test program which uses global array from DLL library which
is not explicitly marked with dllimport.

$ cat main.c
extern int arr[];

int main() {
  return arr[1];
}



Without LTO this example compiles fine:

$ i686-w64-mingw32-gcc -Os -nostartfiles -nodefaultlibs -nostdlib startup.c
pseudo-reloc.c main.c arr.dll -o test.exe


With LTO enabled this example does not compile due to dropping explicitly
referenced symbol:

$ i686-w64-mingw32-gcc -Os -nostartfiles -nodefaultlibs -nostdlib startup.c
pseudo-reloc.c main.c arr.dll -o test.exe -flto
`__pei386_runtime_relocator' referenced in section `.rdata' of
test_exe_ertr04.o: defined in discarded section `.text' of /tmp/ccDpfRvt.o
(symbol from plugin)
collect2: error: ld returned 1 exit status

[Bug lto/109368] New: LTO drops entry point symbol

2023-04-01 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109368

Bug ID: 109368
   Summary: LTO drops entry point symbol
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pali at kernel dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---
Target: Mingw32

LTO for PE executables drops entry point symbol when the default entry point is
used. There is no warning and just PE AddressOfEntryPoint is zeroed. Which
results in broken PE binary.

When non-default entry point is used and specified via -e option then LTO does
not drop entry point symbol and generates working PE executable.



Simple test case which does not use any system library or startup file:

$ cat test-nostartfiles.c
int mainCRTStartup(void) { return 0; }

Default console binary has entry point mainCRTStartup() function (as hardcoded
in LD sources).

$ i686-w64-mingw32-gcc -Wall -Wextra -nostartfiles -nodefaultlibs -nostdlib
test-nostartfiles.c -o test-nostartfiles.exe

Without LTO it generates working PE binary which correctly returns 0 to system.
It also has correct AddressOfEntryPoint field in PE:

$ i686-w64-mingw32-objdump -p test-nostartfiles.exe | grep AddressOfEntryPoint
AddressOfEntryPoint 1000



When compiling with LTO it does not throw any warning but generates broken PE
binary:

$ i686-w64-mingw32-gcc -Wall -Wextra -nostartfiles -nodefaultlibs -nostdlib
test-nostartfiles.c -o test-nostartfiles.exe -flto

Trying to run it, it crashes and has zeroed AddressOfEntryPoint:

$ i686-w64-mingw32-objdump -p test-nostartfiles.exe | grep AddressOfEntryPoint
AddressOfEntryPoint 



When non-default entry point is used (specified via -e option) then LTO works
correctly and does not drop its entry point.

$ cat test-nostartfiles2.c
int my_entry(void) { return 0; }

$ i686-w64-mingw32-gcc -Wall -Wextra -nostartfiles -nodefaultlibs -nostdlib -e
_my_entry test-nostartfiles2.c -o test-nostartfiles2.exe -flto

$ i686-w64-mingw32-objdump -p test-nostartfiles2.exe | grep AddressOfEntryPoint
AddressOfEntryPoint 1000

Compiled binary works fine. 



So there is a bug in LTO compiler that it drops entry point if default one is
used (i.e. when entry point is not specified via -e option).

[Bug target/109317] New: -Os generates bigger code than -O2 on 32-bit ARM

2023-03-28 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109317

Bug ID: 109317
   Summary: -Os generates bigger code than -O2 on 32-bit ARM
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pali at kernel dot org
  Target Milestone: ---
Target: arm-linux-gnueabi

Simple loops like the one in the example below are better optimized for size on
32-bit ARM with -O2 option than -Os option.


$ cat test-arm.c
char *test1(char *ptr) {
  while (*ptr != '\0' || *(ptr+1) != '\0') ptr++;
  ptr++;
  return ptr;
}



$ arm-linux-gnueabi-gcc -O2 -c test-arm.c && arm-linux-gnueabi-objdump -d
test-arm.o

test-arm.o: file format elf32-littlearm


Disassembly of section .text:

 :
   0:   e5d02000ldrbr2, [r0]
   4:   e281add r0, r0, #1
   8:   e352cmp r2, #0
   c:   1afbbne 0 
  10:   e5d02000ldrbr2, [r0]
  14:   e352cmp r2, #0
  18:   1af8bne 0 
  1c:   e12fff1ebx  lr



$ arm-linux-gnueabi-gcc -Os -c test-arm.c && arm-linux-gnueabi-objdump -d
test-arm.o

test-arm.o: file format elf32-littlearm


Disassembly of section .text:

 :
   0:   e1a03000mov r3, r0
   4:   e5d32000ldrbr2, [r3]
   8:   e281add r0, r0, #1
   c:   e352cmp r2, #0
  10:   1afabne 0 
  14:   e5d02000ldrbr2, [r0]
  18:   e352cmp r2, #0
  1c:   1af7bne 0 
  20:   e12fff1ebx  lr



$ arm-linux-gnueabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc-cross/arm-linux-gnueabi/12/lto-wrapper
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Debian 12.2.0-14'
--with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12 --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/
--enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm
--disable-libquadmath --disable-libquadmath-support --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--without-target-system-zlib --enable-multiarch --disable-sjlj-exceptions
--with-arch=armv5te --with-float=soft --disable-werror
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=arm-linux-gnueabi --program-prefix=arm-linux-gnueabi-
--includedir=/usr/arm-linux-gnueabi/include
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Debian 12.2.0-14)

[Bug target/108853] Add new -mcpu=e500 alias for -mcpu=8540

2023-02-21 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108853

--- Comment #3 from Pali Rohár  ---
I'm still using processors with e500 cores with recent Linux kernel versions
and I know also other people who also still using them.

Note that NXP still supports some QorIQ processors which have integrated e500
cores. So it is not truth that they are no longer supported by FreeScale/NXP.

I know that e500 support was mostly removed out of GCC, but something is still
there.

And due to this removal, LLVM and clang recently gained some usable e500v2
implementation. I was told that it was heavily tested on FreeBSD with desktop
applications.

Also musl libc in last year got e500 support.

So, no, e500 cpu core is not dead and people still care about it.

[Bug c/108866] New: Allow to pass Windows resource file (.rc) as input to gcc

2023-02-20 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108866

Bug ID: 108866
   Summary: Allow to pass Windows resource file (.rc) as input to
gcc
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pali at kernel dot org
  Target Milestone: ---

Currently it is possible to pass source C file, source assembler file, object
file, library to gcc as an input argument and gcc calls needed tools to compile
and link all input files to one output binary.

But gcc currently is not able to recognize Windows resource text file .rc when
it is passed as input argument. See:

$ x86_64-w64-mingw32-gcc test-rsrc.rc
/usr/bin/x86_64-w64-mingw32-ld:test-rsrc.rc: file format not recognized;
treating as linker script
/usr/bin/x86_64-w64-mingw32-ld:test-rsrc.rc:1: syntax error
collect2: error: ld returned 1 exit status

Currently resource file first needs to be passed to windres compiler and then
output object file from windres can be specified as input argument to gcc:

$ x86_64-w64-mingw32-windres --input-format=rc --output-format=coff
--input=test-rsrc.rc --output=test-rsrc.o

It would be nice if gcc is able to call windres automatically for resource text
file, like for assembler source, for generating object file.

[Bug c/108849] New: __declspec(code_seg("segname")) does not work

2023-02-20 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108849

Bug ID: 108849
   Summary: __declspec(code_seg("segname")) does not work
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pali at kernel dot org
  Target Milestone: ---

Originally reported on: https://sourceware.org/bugzilla/show_bug.cgi?id=30005

GCC/LD does not support __declspec(code_seg("segname")) declarator for
specifying name of PE/COFF segment name.

Instead GCC/LD supports different and custom syntax
__declspec(section("segname")) incompatible with other compilers, like MSVC.

Please add support for de-facto standard "code_seg" declarator into the PE/COFF
__declspec keyword and not custom declarator. It does not bring any value, just
make code incompatible with gcc.

Test case on Debian sid:

$ x86_64-w64-mingw32-ld -v
GNU ld (GNU Binutils) 2.39.90.20230110
$
$ x86_64-w64-mingw32-gcc -v
Using built-in specs.
COLLECT_GCC=x86_64-w64-mingw32-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-w64-mingw32/12-win32/lto-wrapper
Target: x86_64-w64-mingw32
Configured with: ../../src/configure --build=x86_64-linux-gnu --prefix=/usr
--includedir='/usr/include' --mandir='/usr/share/man'
--infodir='/usr/share/info' --sysconfdir=/etc --localstatedir=/var
--disable-option-checking --disable-silent-rules
--libdir='/usr/lib/x86_64-linux-gnu' --libexecdir='/usr/lib/x86_64-linux-gnu'
--disable-maintainer-mode --disable-dependency-tracking --prefix=/usr
--enable-shared --enable-static --disable-multilib --with-system-zlib
--libexecdir=/usr/lib --without-included-gettext --libdir=/usr/lib
--enable-libstdcxx-time=yes --with-tune=generic --with-headers
--enable-version-specific-runtime-libs --enable-fully-dynamic-string
--enable-libgomp --enable-languages=c,c++,fortran,objc,obj-c++,ada --enable-lto
--enable-threads=win32 --program-suffix=-win32
--program-prefix=x86_64-w64-mingw32- --target=x86_64-w64-mingw32
--with-as=/usr/bin/x86_64-w64-mingw32-as
--with-ld=/usr/bin/x86_64-w64-mingw32-ld --enable-libatomic
--enable-libstdcxx-filesystem-ts=yes --enable-dependency-tracking SED=/bin/sed
Thread model: win32
Supported LTO compression algorithms: zlib
gcc version 12-win32 (GCC)
$
$ cat test-code-seg.c
__declspec(code_seg("segname"))
int test(void) { return 0; }
$
$ x86_64-w64-mingw32-gcc -c test-code-seg.c -o test-code-seg.o
test-code-seg.c:2:1: warning: 'code_seg' attribute directive ignored
[-Wattributes]
2 | int test(void) { return 0; }
  | ^~~
$
$ x86_64-w64-mingw32-objdump -h test-code-seg.o | grep segname
$

[Bug target/108853] New: Add new -mcpu=e500 alias for -mcpu=8540

2023-02-20 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108853

Bug ID: 108853
   Summary: Add new -mcpu=e500 alias for -mcpu=8540
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pali at kernel dot org
  Target Milestone: ---
Target: powerpc*

To compile code for powerpc e500 core it is needed to specify -mcpu=8540
option. Name 8540 refers to the SoC MPC8540, which was the first released HW
product with powerpc e500 core. All other powerpc gcc's -mcpu options specify
core names and not the SoC/product name.

So for consistent naming I would propose to add a new option -mcpu=e500 as an
alias to -mcpu=8540. Note that other projects like binutils/as and LLVM use
"e500" name for specifying e500 core, and not 8540 word like gcc.

[Bug c/108852] New: Add gcc option for building NT kernel driver

2023-02-20 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108852

Bug ID: 108852
   Summary: Add gcc option for building NT kernel driver
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pali at kernel dot org
  Target Milestone: ---

gcc already contains options for building different types of PE binaries:
-mconsole for console executable; -mwindows for GUI executable; -mdll for DLL
library; ...

What is missing is option for building NT kernel driver. MSVC link.exe has for
this /DRIVER option.

It would be nice to have such option also in gcc, which sets all options
required for building NT kernel driver. Like not linking startup files, setting
image base address and aligning, setting correct entry point or setting PE
Native subsystem.

[Bug c/108851] New: gcc -pie generates unwanted PE export table

2023-02-20 Thread pali at kernel dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108851

Bug ID: 108851
   Summary: gcc -pie generates unwanted PE export table
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pali at kernel dot org
  Target Milestone: ---

When gcc is invoked with -pie option then for PE executables it automatically
generates export table, even when executable does not export anything.

Test case:

$ cat test-pie.c
int func(void) {
return 42;
}

int main() {
return func();
}

$ x86_64-w64-mingw32-gcc -pie test-pie.c -o test-pie.exe

$ x86_64-w64-mingw32-objdump -p test-pie.exe | grep -A 20 'There is an export
table'
There is an export table in .edata at 0x140008000

The Export Tables (interpreted .edata section contents)

Export Flags0
Time/Date stamp 63f2a29f
Major/Minor 0/0
Name8028 test-pie.exe
Ordinal Base1
Number in:
Export Address Table
[Name Pointer/Ordinal] Table
Table Addresses
Export Address Table8028
Name Pointer Table  8028
Ordinal Table   8028

Export Address Table -- Ordinal Base 1

[Ordinal/Name Pointer] Table

Without gcc's -pie option, executable does not have export table.

Note that similar issue was reported also to LD
https://sourceware.org/bugzilla/show_bug.cgi?id=30004 and proposed LD patch
does not change behavior in this issue.