Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-27 Thread Martin Liška

On 3/27/20 3:21 PM, Jeff Law wrote:

On Fri, 2020-03-27 at 10:10 +0100, Martin Liška wrote:

On 3/26/20 5:54 PM, Jeff Law wrote:

On Mon, 2020-03-09 at 17:56 +0100, Martin Liška wrote:

On 3/9/20 4:36 PM, H.J. Lu wrote:

We nee to support different variables, like TLS, data and bss variables.


Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?

About BSS and DATA I agree that it would be handy. I can theoretically
covered with code in get_variable_section/bss_initializer_p. But it's
quite logic and I'm not sure we should simulate it.

@Honza/Richi: Do you have any opinion about that?

Or could we just fix the damn broken configure scripts?  Isn't that what's
driving all this nonsense?


Hello.

Yes. Apparently libtool people were unable to come up with a reasonable
workaround for that. One can read technical details in this PR:
https://sourceware.org/bugzilla/show_bug.cgi?id=25355

What's funny is that discussion doesn't mention the HP-UX bits which handle this
correctly.  However, I think my tests were all done without the common symbol
handling change we made late last year.  So that may complicate things.


Note that the switching to -fno-common was actually the trigger of the issues.
Before the change, one can easily disambiguate global vars and functions based
on the LDPK_COMMON symbol_kind.







The HP-UX bits for this in configure work correctly.I verified that
months
ago and even have a sed script which will take a broken configure script and
fix
it.


How many packages will you have to fix?

IIRC it's less than a half-dozen.  I found them by capturing the generated
config.h files with and without LTO enabled and comparing them.  I don't offhand
know if the differences caused the packages to mis-behave or not -- we just
failed any build where those config.h files differed.



The current LTO plug-in extension will work fine with gcc10 + latest bintuils
release
branch. That's a viable solution for us openSUSE.

If it's working, then great.  But I worry about having to keep binutils/gcc
consistent.  In fact, what brought me into the conversation again was them
getting out of sync in the last few days and binutils tests started failing.


It was probably caused by H.J.'s attempt to use lto-wrapper for nm (for LTO 
bytecode)
which had various issues.

Martin



jeff





Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-27 Thread Jeff Law via Gcc-patches
On Fri, 2020-03-27 at 10:10 +0100, Martin Liška wrote:
> On 3/26/20 5:54 PM, Jeff Law wrote:
> > On Mon, 2020-03-09 at 17:56 +0100, Martin Liška wrote:
> > > On 3/9/20 4:36 PM, H.J. Lu wrote:
> > > > We nee to support different variables, like TLS, data and bss variables.
> > > 
> > > Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?
> > > 
> > > About BSS and DATA I agree that it would be handy. I can theoretically
> > > covered with code in get_variable_section/bss_initializer_p. But it's
> > > quite logic and I'm not sure we should simulate it.
> > > 
> > > @Honza/Richi: Do you have any opinion about that?
> > Or could we just fix the damn broken configure scripts?  Isn't that what's
> > driving all this nonsense?
> 
> Hello.
> 
> Yes. Apparently libtool people were unable to come up with a reasonable
> workaround for that. One can read technical details in this PR:
> https://sourceware.org/bugzilla/show_bug.cgi?id=25355
What's funny is that discussion doesn't mention the HP-UX bits which handle this
correctly.  However, I think my tests were all done without the common symbol
handling change we made late last year.  So that may complicate things.


> 
> > The HP-UX bits for this in configure work correctly.I verified that
> > months
> > ago and even have a sed script which will take a broken configure script and
> > fix
> > it.
> 
> How many packages will you have to fix?
IIRC it's less than a half-dozen.  I found them by capturing the generated
config.h files with and without LTO enabled and comparing them.  I don't offhand
know if the differences caused the packages to mis-behave or not -- we just
failed any build where those config.h files differed.

> 
> The current LTO plug-in extension will work fine with gcc10 + latest bintuils
> release
> branch. That's a viable solution for us openSUSE.
If it's working, then great.  But I worry about having to keep binutils/gcc
consistent.  In fact, what brought me into the conversation again was them
getting out of sync in the last few days and binutils tests started failing.

jeff



Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-27 Thread Martin Liška

On 3/26/20 5:54 PM, Jeff Law wrote:

On Mon, 2020-03-09 at 17:56 +0100, Martin Liška wrote:

On 3/9/20 4:36 PM, H.J. Lu wrote:

We nee to support different variables, like TLS, data and bss variables.


Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?

About BSS and DATA I agree that it would be handy. I can theoretically
covered with code in get_variable_section/bss_initializer_p. But it's
quite logic and I'm not sure we should simulate it.

@Honza/Richi: Do you have any opinion about that?

Or could we just fix the damn broken configure scripts?  Isn't that what's
driving all this nonsense?


Hello.

Yes. Apparently libtool people were unable to come up with a reasonable
workaround for that. One can read technical details in this PR:
https://sourceware.org/bugzilla/show_bug.cgi?id=25355



The HP-UX bits for this in configure work correctly.I verified that months
ago and even have a sed script which will take a broken configure script and fix
it.


How many packages will you have to fix?

The current LTO plug-in extension will work fine with gcc10 + latest bintuils 
release
branch. That's a viable solution for us openSUSE.

Martin



jeff





Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-26 Thread Jeff Law via Gcc-patches
On Mon, 2020-03-09 at 17:56 +0100, Martin Liška wrote:
> On 3/9/20 4:36 PM, H.J. Lu wrote:
> > We nee to support different variables, like TLS, data and bss variables.
> 
> Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?
> 
> About BSS and DATA I agree that it would be handy. I can theoretically
> covered with code in get_variable_section/bss_initializer_p. But it's
> quite logic and I'm not sure we should simulate it.
> 
> @Honza/Richi: Do you have any opinion about that?
Or could we just fix the damn broken configure scripts?  Isn't that what's
driving all this nonsense?

The HP-UX bits for this in configure work correctly.I verified that months
ago and even have a sed script which will take a broken configure script and fix
it.

jeff



Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-19 Thread H.J. Lu via Gcc-patches
On Thu, Mar 19, 2020 at 12:56 PM Richard Biener
 wrote:
>
> On March 19, 2020 5:51:14 PM GMT+01:00, "H.J. Lu"  wrote:
> >On Thu, Mar 19, 2020 at 9:00 AM Martin Liška  wrote:
> >>
> >> On 3/19/20 4:50 PM, H.J. Lu wrote:
> >> > I like it and I will take case of binutils side.
> >> >
> >> > Thanks.
> >>
> >> Great. I've just installed the 2 patches to master.
> >>
> >
> >Here is the binutils patch:
> >
> >https://sourceware.org/pipermail/binutils/2020-March/110295.html
>
> If the plugin advertises v2 support but the LTO IL files lack symtab_ext data 
> what happens?

Did you mean symbol_type == LDST_UNKNOWN? In that case, you get the
text symbol for everything.


-- 
H.J.


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-19 Thread Richard Biener via Gcc-patches
On March 19, 2020 5:51:14 PM GMT+01:00, "H.J. Lu"  wrote:
>On Thu, Mar 19, 2020 at 9:00 AM Martin Liška  wrote:
>>
>> On 3/19/20 4:50 PM, H.J. Lu wrote:
>> > I like it and I will take case of binutils side.
>> >
>> > Thanks.
>>
>> Great. I've just installed the 2 patches to master.
>>
>
>Here is the binutils patch:
>
>https://sourceware.org/pipermail/binutils/2020-March/110295.html

If the plugin advertises v2 support but the LTO IL files lack symtab_ext data 
what happens? 

Richard. 



Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-19 Thread H.J. Lu via Gcc-patches
On Thu, Mar 19, 2020 at 9:00 AM Martin Liška  wrote:
>
> On 3/19/20 4:50 PM, H.J. Lu wrote:
> > I like it and I will take case of binutils side.
> >
> > Thanks.
>
> Great. I've just installed the 2 patches to master.
>

Here is the binutils patch:

https://sourceware.org/pipermail/binutils/2020-March/110295.html

-- 
H.J.


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-19 Thread Martin Liška

On 3/19/20 4:50 PM, H.J. Lu wrote:

I like it and I will take case of binutils side.

Thanks.


Great. I've just installed the 2 patches to master.

Martin


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-19 Thread H.J. Lu via Gcc-patches
On Thu, Mar 19, 2020 at 8:46 AM Richard Biener
 wrote:
>
> On Thu, Mar 19, 2020 at 4:00 PM Martin Liška  wrote:
> >
> > On 3/19/20 10:12 AM, Richard Biener wrote:
> > > On Wed, Mar 18, 2020 at 9:52 AM Martin Liška  wrote:
> > >>
> > >> On 3/18/20 12:27 AM, Jan Hubicka wrote:
> >  Hi.
> > 
> >  There's updated version of the patch.
> >  Changes from the previous version:
> >  - comment added to ld_plugin_symbol
> >  - new section renamed to ext_symtab
> >  - assert added for loop iterations in produce_symtab and 
> >  produce_symtab_extension
> > >>> Hi,
> > >>> I hope this is last version of the patch.
> > >>
> > >> Hello.
> > >>
> > >> Yes.
> > >>
> > 
> >  2020-03-12  Martin Liska  
> > 
> >    * lto-section-in.c: Add extension_symtab.
> > >>> ext_symtab  :)
> > >>
> > >> Fixed.
> > >>
> >  diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> >  index c17dd69dbdd..78b015be696 100644
> >  --- a/gcc/lto-section-in.c
> >  +++ b/gcc/lto-section-in.c
> >  @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
> >   "mode_table",
> >   "hsa",
> >   "lto",
> >  -  "ipa_sra"
> >  +  "ipa_sra",
> >  +  "ext_symtab"
> > >>> I would move ext_symtab next to symtab so the sections remains at least
> > >>> bit reasonably ordered.
> > >>
> > >> Ok, I'll adjust it and I will send a separate patch where we bump 
> > >> LTO_major_version.
> > >>
> > 
> >  +/* Write extension information for symbols (symbol type, section 
> >  flags).  */
> >  +
> >  +static void
> >  +write_symbol_extension_info (tree t)
> >  +{
> >  +  unsigned char c;
> > >>> Do we still use vertical whitespace after decls per GNU coding style?
> > >>
> > >> Dunno. This seems to me like a nit.
> > >>
> >  diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> >  index 25bf6c468f7..4f82b439360 100644
> >  --- a/gcc/lto-streamer.h
> >  +++ b/gcc/lto-streamer.h
> >  @@ -236,6 +236,7 @@ enum lto_section_type
> >   LTO_section_ipa_hsa,
> >   LTO_section_lto,
> >   LTO_section_ipa_sra,
> >  +  LTO_section_symtab_extension,
> > >>> I guess symtab_ext to match the actual section name?
> > >>
> > >> No. See e.g.   LTO_section_jump_functions - "jmpfuncs". We want to have 
> > >> more descriptive
> > >> enum names.
> > >>
> >   LTO_N_SECTION_TYPES  /* Must be last.  */
> > };
> > 
> >  diff --git a/include/lto-symtab.h b/include/lto-symtab.h
> >  index 0ce0de10121..47f0ff27df8 100644
> >  --- a/include/lto-symtab.h
> >  +++ b/include/lto-symtab.h
> >  @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
> > GCCPV_HIDDEN
> >   };
> > 
> >  +enum gcc_plugin_symbol_type
> >  +{
> >  +  GCCST_UNKNOWN,
> >  +  GCCST_FUNCTION,
> >  +  GCCST_VARIABLE,
> >  +};
> >  +
> >  +enum gcc_plugin_symbol_section_flags
> >  +{
> >  +  GCCSSS_BSS = 1
> >  +};
> > >>>
> > >>> Probably comments here?
> > >>
> > >> No. There are just shadow copy of enum types from plugin-api.h which
> > >> are documented.
> > >>
> >  +
> > #endif /* GCC_LTO_SYMTAB_H  */
> >  +/* Parse an entry of the IL symbol table. The data to be parsed is 
> >  pointed
> >  +   by P and the result is written in ENTRY. The slot number is stored 
> >  in SLOT.
> >  +   Returns the address of the next entry. */
> >  +
> >  +static char *
> >  +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
> >  +{
> >  +  unsigned char t;
> >  +  enum ld_plugin_symbol_type symbol_types[] =
> >  +{
> >  +  LDST_UNKNOWN,
> >  +  LDST_FUNCTION,
> >  +  LDST_VARIABLE,
> >  +};
> >  +
> >  +  t = *p;
> >  +  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
> >  +  entry->symbol_type = symbol_types[t];
> >  +  p++;
> >  +  entry->section_flags = *p;
> >  +  p++;
> >  +
> >  +  return p;
> >  +}
> > >>>
> > >>> I think we have chance to make some plan for future extensions without
> > >>> introducing too many additional sections.
> > >>>
> > >>> Currently there are 2 bytes per entry, while only 3 bits are actively
> > >>> used of them.  If we invent next flag to pass we can use unused bits
> > >>> however we need a way to indicate to plugin that the bit is defined.
> > >>> This could be done by a simple version byte at the beggining of
> > >>> ext_symtab section which will be 0 now and once we define extra bits we
> > >>> bump it up to 1.
> > >>
> > >> I like the suggested change, it can help us in the future.
> > >>
> > >>>
> > >>> It is not that important given that even empty file results in 2k LTO
> > >>> object file, but I think it would be nicer in longer run.
> >  +  /* This is for compatibility with older ABIs.  */
> > >>> Perhaps say here 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-19 Thread Richard Biener via Gcc-patches
On Thu, Mar 19, 2020 at 4:00 PM Martin Liška  wrote:
>
> On 3/19/20 10:12 AM, Richard Biener wrote:
> > On Wed, Mar 18, 2020 at 9:52 AM Martin Liška  wrote:
> >>
> >> On 3/18/20 12:27 AM, Jan Hubicka wrote:
>  Hi.
> 
>  There's updated version of the patch.
>  Changes from the previous version:
>  - comment added to ld_plugin_symbol
>  - new section renamed to ext_symtab
>  - assert added for loop iterations in produce_symtab and 
>  produce_symtab_extension
> >>> Hi,
> >>> I hope this is last version of the patch.
> >>
> >> Hello.
> >>
> >> Yes.
> >>
> 
>  2020-03-12  Martin Liska  
> 
>    * lto-section-in.c: Add extension_symtab.
> >>> ext_symtab  :)
> >>
> >> Fixed.
> >>
>  diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
>  index c17dd69dbdd..78b015be696 100644
>  --- a/gcc/lto-section-in.c
>  +++ b/gcc/lto-section-in.c
>  @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
>   "mode_table",
>   "hsa",
>   "lto",
>  -  "ipa_sra"
>  +  "ipa_sra",
>  +  "ext_symtab"
> >>> I would move ext_symtab next to symtab so the sections remains at least
> >>> bit reasonably ordered.
> >>
> >> Ok, I'll adjust it and I will send a separate patch where we bump 
> >> LTO_major_version.
> >>
> 
>  +/* Write extension information for symbols (symbol type, section 
>  flags).  */
>  +
>  +static void
>  +write_symbol_extension_info (tree t)
>  +{
>  +  unsigned char c;
> >>> Do we still use vertical whitespace after decls per GNU coding style?
> >>
> >> Dunno. This seems to me like a nit.
> >>
>  diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
>  index 25bf6c468f7..4f82b439360 100644
>  --- a/gcc/lto-streamer.h
>  +++ b/gcc/lto-streamer.h
>  @@ -236,6 +236,7 @@ enum lto_section_type
>   LTO_section_ipa_hsa,
>   LTO_section_lto,
>   LTO_section_ipa_sra,
>  +  LTO_section_symtab_extension,
> >>> I guess symtab_ext to match the actual section name?
> >>
> >> No. See e.g.   LTO_section_jump_functions - "jmpfuncs". We want to have 
> >> more descriptive
> >> enum names.
> >>
>   LTO_N_SECTION_TYPES  /* Must be last.  */
> };
> 
>  diff --git a/include/lto-symtab.h b/include/lto-symtab.h
>  index 0ce0de10121..47f0ff27df8 100644
>  --- a/include/lto-symtab.h
>  +++ b/include/lto-symtab.h
>  @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
> GCCPV_HIDDEN
>   };
> 
>  +enum gcc_plugin_symbol_type
>  +{
>  +  GCCST_UNKNOWN,
>  +  GCCST_FUNCTION,
>  +  GCCST_VARIABLE,
>  +};
>  +
>  +enum gcc_plugin_symbol_section_flags
>  +{
>  +  GCCSSS_BSS = 1
>  +};
> >>>
> >>> Probably comments here?
> >>
> >> No. There are just shadow copy of enum types from plugin-api.h which
> >> are documented.
> >>
>  +
> #endif /* GCC_LTO_SYMTAB_H  */
>  +/* Parse an entry of the IL symbol table. The data to be parsed is 
>  pointed
>  +   by P and the result is written in ENTRY. The slot number is stored 
>  in SLOT.
>  +   Returns the address of the next entry. */
>  +
>  +static char *
>  +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
>  +{
>  +  unsigned char t;
>  +  enum ld_plugin_symbol_type symbol_types[] =
>  +{
>  +  LDST_UNKNOWN,
>  +  LDST_FUNCTION,
>  +  LDST_VARIABLE,
>  +};
>  +
>  +  t = *p;
>  +  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
>  +  entry->symbol_type = symbol_types[t];
>  +  p++;
>  +  entry->section_flags = *p;
>  +  p++;
>  +
>  +  return p;
>  +}
> >>>
> >>> I think we have chance to make some plan for future extensions without
> >>> introducing too many additional sections.
> >>>
> >>> Currently there are 2 bytes per entry, while only 3 bits are actively
> >>> used of them.  If we invent next flag to pass we can use unused bits
> >>> however we need a way to indicate to plugin that the bit is defined.
> >>> This could be done by a simple version byte at the beggining of
> >>> ext_symtab section which will be 0 now and once we define extra bits we
> >>> bump it up to 1.
> >>
> >> I like the suggested change, it can help us in the future.
> >>
> >>>
> >>> It is not that important given that even empty file results in 2k LTO
> >>> object file, but I think it would be nicer in longer run.
>  +  /* This is for compatibility with older ABIs.  */
> >>> Perhaps say here that this ABI defined only "int def;"
> >>
> >> Good point.
> >>
> >>>
> >>> The patch look good to me. Thanks for the work!
> >>
> >> Thanks. I'm sending updated patch that I've just tested on lto.exp and
> >> both binutils master and HJ's branch that utilizes the new API.
> >
> > @@ -495,10 +560,16 @@ write_resolution (void)
> >
> > /* 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-19 Thread Martin Liška

On 3/19/20 10:12 AM, Richard Biener wrote:

On Wed, Mar 18, 2020 at 9:52 AM Martin Liška  wrote:


On 3/18/20 12:27 AM, Jan Hubicka wrote:

Hi.

There's updated version of the patch.
Changes from the previous version:
- comment added to ld_plugin_symbol
- new section renamed to ext_symtab
- assert added for loop iterations in produce_symtab and 
produce_symtab_extension

Hi,
I hope this is last version of the patch.


Hello.

Yes.



2020-03-12  Martin Liska  

  * lto-section-in.c: Add extension_symtab.

ext_symtab  :)


Fixed.


diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index c17dd69dbdd..78b015be696 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
 "mode_table",
 "hsa",
 "lto",
-  "ipa_sra"
+  "ipa_sra",
+  "ext_symtab"

I would move ext_symtab next to symtab so the sections remains at least
bit reasonably ordered.


Ok, I'll adjust it and I will send a separate patch where we bump 
LTO_major_version.



+/* Write extension information for symbols (symbol type, section flags).  */
+
+static void
+write_symbol_extension_info (tree t)
+{
+  unsigned char c;

Do we still use vertical whitespace after decls per GNU coding style?


Dunno. This seems to me like a nit.


diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 25bf6c468f7..4f82b439360 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -236,6 +236,7 @@ enum lto_section_type
 LTO_section_ipa_hsa,
 LTO_section_lto,
 LTO_section_ipa_sra,
+  LTO_section_symtab_extension,

I guess symtab_ext to match the actual section name?


No. See e.g.   LTO_section_jump_functions - "jmpfuncs". We want to have more 
descriptive
enum names.


 LTO_N_SECTION_TYPES  /* Must be last.  */
   };

diff --git a/include/lto-symtab.h b/include/lto-symtab.h
index 0ce0de10121..47f0ff27df8 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
   GCCPV_HIDDEN
 };

+enum gcc_plugin_symbol_type
+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE,
+};
+
+enum gcc_plugin_symbol_section_flags
+{
+  GCCSSS_BSS = 1
+};


Probably comments here?


No. There are just shadow copy of enum types from plugin-api.h which
are documented.


+
   #endif /* GCC_LTO_SYMTAB_H  */
+/* Parse an entry of the IL symbol table. The data to be parsed is pointed
+   by P and the result is written in ENTRY. The slot number is stored in SLOT.
+   Returns the address of the next entry. */
+
+static char *
+parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
+{
+  unsigned char t;
+  enum ld_plugin_symbol_type symbol_types[] =
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE,
+};
+
+  t = *p;
+  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
+  entry->symbol_type = symbol_types[t];
+  p++;
+  entry->section_flags = *p;
+  p++;
+
+  return p;
+}


I think we have chance to make some plan for future extensions without
introducing too many additional sections.

Currently there are 2 bytes per entry, while only 3 bits are actively
used of them.  If we invent next flag to pass we can use unused bits
however we need a way to indicate to plugin that the bit is defined.
This could be done by a simple version byte at the beggining of
ext_symtab section which will be 0 now and once we define extra bits we
bump it up to 1.


I like the suggested change, it can help us in the future.



It is not that important given that even empty file results in 2k LTO
object file, but I think it would be nicer in longer run.

+  /* This is for compatibility with older ABIs.  */

Perhaps say here that this ABI defined only "int def;"


Good point.



The patch look good to me. Thanks for the work!


Thanks. I'm sending updated patch that I've just tested on lto.exp and
both binutils master and HJ's branch that utilizes the new API.


@@ -495,10 +560,16 @@ write_resolution (void)

/* Version 2 of API supports IRONLY_EXP resolution that is
   accepted by GCC-4.7 and newer.  */
-  if (get_symbols_v2)
-get_symbols_v2 (info->handle, symtab->nsyms, syms);
+  if (get_symbols_v4)
+   get_symbols_v4 (info->handle, symtab->nsyms, syms);
else
-get_symbols (info->handle, symtab->nsyms, syms);
+   {
+ clear_new_symtab_flags (symtab);

can you instead just avoid parsing the ext symtab?


Yes, I simplified the changes and I bet we'll only need one new hook 
get_symbols_v2.
Then we can base parsing of the external symtab on that.



+ if (get_symbols_v2)
+   get_symbols_v2 (info->handle, symtab->nsyms, syms);
+ else
+   get_symbols (info->handle, symtab->nsyms, syms);
+   }

I guess this also points to the fact that LDs symbol resolution
can't tell GCC it chose "BSS" (from a non-IL object) or that
it chose a variable or function.

@@ -296,6 +300,8 @@ parse_table_entry (char 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-19 Thread Richard Biener via Gcc-patches
On Wed, Mar 18, 2020 at 9:52 AM Martin Liška  wrote:
>
> On 3/18/20 12:27 AM, Jan Hubicka wrote:
> >> Hi.
> >>
> >> There's updated version of the patch.
> >> Changes from the previous version:
> >> - comment added to ld_plugin_symbol
> >> - new section renamed to ext_symtab
> >> - assert added for loop iterations in produce_symtab and 
> >> produce_symtab_extension
> > Hi,
> > I hope this is last version of the patch.
>
> Hello.
>
> Yes.
>
> >>
> >> 2020-03-12  Martin Liska  
> >>
> >>  * lto-section-in.c: Add extension_symtab.
> > ext_symtab  :)
>
> Fixed.
>
> >> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> >> index c17dd69dbdd..78b015be696 100644
> >> --- a/gcc/lto-section-in.c
> >> +++ b/gcc/lto-section-in.c
> >> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
> >> "mode_table",
> >> "hsa",
> >> "lto",
> >> -  "ipa_sra"
> >> +  "ipa_sra",
> >> +  "ext_symtab"
> > I would move ext_symtab next to symtab so the sections remains at least
> > bit reasonably ordered.
>
> Ok, I'll adjust it and I will send a separate patch where we bump 
> LTO_major_version.
>
> >>
> >> +/* Write extension information for symbols (symbol type, section flags).  
> >> */
> >> +
> >> +static void
> >> +write_symbol_extension_info (tree t)
> >> +{
> >> +  unsigned char c;
> > Do we still use vertical whitespace after decls per GNU coding style?
>
> Dunno. This seems to me like a nit.
>
> >> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> >> index 25bf6c468f7..4f82b439360 100644
> >> --- a/gcc/lto-streamer.h
> >> +++ b/gcc/lto-streamer.h
> >> @@ -236,6 +236,7 @@ enum lto_section_type
> >> LTO_section_ipa_hsa,
> >> LTO_section_lto,
> >> LTO_section_ipa_sra,
> >> +  LTO_section_symtab_extension,
> > I guess symtab_ext to match the actual section name?
>
> No. See e.g.   LTO_section_jump_functions - "jmpfuncs". We want to have more 
> descriptive
> enum names.
>
> >> LTO_N_SECTION_TYPES  /* Must be last.  */
> >>   };
> >>
> >> diff --git a/include/lto-symtab.h b/include/lto-symtab.h
> >> index 0ce0de10121..47f0ff27df8 100644
> >> --- a/include/lto-symtab.h
> >> +++ b/include/lto-symtab.h
> >> @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
> >>   GCCPV_HIDDEN
> >> };
> >>
> >> +enum gcc_plugin_symbol_type
> >> +{
> >> +  GCCST_UNKNOWN,
> >> +  GCCST_FUNCTION,
> >> +  GCCST_VARIABLE,
> >> +};
> >> +
> >> +enum gcc_plugin_symbol_section_flags
> >> +{
> >> +  GCCSSS_BSS = 1
> >> +};
> >
> > Probably comments here?
>
> No. There are just shadow copy of enum types from plugin-api.h which
> are documented.
>
> >> +
> >>   #endif /* GCC_LTO_SYMTAB_H  */
> >> +/* Parse an entry of the IL symbol table. The data to be parsed is pointed
> >> +   by P and the result is written in ENTRY. The slot number is stored in 
> >> SLOT.
> >> +   Returns the address of the next entry. */
> >> +
> >> +static char *
> >> +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
> >> +{
> >> +  unsigned char t;
> >> +  enum ld_plugin_symbol_type symbol_types[] =
> >> +{
> >> +  LDST_UNKNOWN,
> >> +  LDST_FUNCTION,
> >> +  LDST_VARIABLE,
> >> +};
> >> +
> >> +  t = *p;
> >> +  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
> >> +  entry->symbol_type = symbol_types[t];
> >> +  p++;
> >> +  entry->section_flags = *p;
> >> +  p++;
> >> +
> >> +  return p;
> >> +}
> >
> > I think we have chance to make some plan for future extensions without
> > introducing too many additional sections.
> >
> > Currently there are 2 bytes per entry, while only 3 bits are actively
> > used of them.  If we invent next flag to pass we can use unused bits
> > however we need a way to indicate to plugin that the bit is defined.
> > This could be done by a simple version byte at the beggining of
> > ext_symtab section which will be 0 now and once we define extra bits we
> > bump it up to 1.
>
> I like the suggested change, it can help us in the future.
>
> >
> > It is not that important given that even empty file results in 2k LTO
> > object file, but I think it would be nicer in longer run.
> >> +  /* This is for compatibility with older ABIs.  */
> > Perhaps say here that this ABI defined only "int def;"
>
> Good point.
>
> >
> > The patch look good to me. Thanks for the work!
>
> Thanks. I'm sending updated patch that I've just tested on lto.exp and
> both binutils master and HJ's branch that utilizes the new API.

@@ -495,10 +560,16 @@ write_resolution (void)

   /* Version 2 of API supports IRONLY_EXP resolution that is
  accepted by GCC-4.7 and newer.  */
-  if (get_symbols_v2)
-get_symbols_v2 (info->handle, symtab->nsyms, syms);
+  if (get_symbols_v4)
+   get_symbols_v4 (info->handle, symtab->nsyms, syms);
   else
-get_symbols (info->handle, symtab->nsyms, syms);
+   {
+ clear_new_symtab_flags (symtab);

can you instead just avoid parsing the ext symtab?

+ if 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-18 Thread Martin Liška

On 3/18/20 12:27 AM, Jan Hubicka wrote:

Hi.

There's updated version of the patch.
Changes from the previous version:
- comment added to ld_plugin_symbol
- new section renamed to ext_symtab
- assert added for loop iterations in produce_symtab and 
produce_symtab_extension

Hi,
I hope this is last version of the patch.


Hello.

Yes.



2020-03-12  Martin Liska  

* lto-section-in.c: Add extension_symtab.

ext_symtab  :)


Fixed.


diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index c17dd69dbdd..78b015be696 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
"mode_table",
"hsa",
"lto",
-  "ipa_sra"
+  "ipa_sra",
+  "ext_symtab"

I would move ext_symtab next to symtab so the sections remains at least
bit reasonably ordered.


Ok, I'll adjust it and I will send a separate patch where we bump 
LTO_major_version.

  
+/* Write extension information for symbols (symbol type, section flags).  */

+
+static void
+write_symbol_extension_info (tree t)
+{
+  unsigned char c;

Do we still use vertical whitespace after decls per GNU coding style?


Dunno. This seems to me like a nit.


diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 25bf6c468f7..4f82b439360 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -236,6 +236,7 @@ enum lto_section_type
LTO_section_ipa_hsa,
LTO_section_lto,
LTO_section_ipa_sra,
+  LTO_section_symtab_extension,

I guess symtab_ext to match the actual section name?


No. See e.g.   LTO_section_jump_functions - "jmpfuncs". We want to have more 
descriptive
enum names.


LTO_N_SECTION_TYPES /* Must be last.  */
  };
  
diff --git a/include/lto-symtab.h b/include/lto-symtab.h

index 0ce0de10121..47f0ff27df8 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
  GCCPV_HIDDEN
};
  
+enum gcc_plugin_symbol_type

+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE,
+};
+
+enum gcc_plugin_symbol_section_flags
+{
+  GCCSSS_BSS = 1
+};


Probably comments here?


No. There are just shadow copy of enum types from plugin-api.h which
are documented.


+
  #endif /* GCC_LTO_SYMTAB_H  */
+/* Parse an entry of the IL symbol table. The data to be parsed is pointed
+   by P and the result is written in ENTRY. The slot number is stored in SLOT.
+   Returns the address of the next entry. */
+
+static char *
+parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
+{
+  unsigned char t;
+  enum ld_plugin_symbol_type symbol_types[] =
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE,
+};
+
+  t = *p;
+  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
+  entry->symbol_type = symbol_types[t];
+  p++;
+  entry->section_flags = *p;
+  p++;
+
+  return p;
+}


I think we have chance to make some plan for future extensions without
introducing too many additional sections.

Currently there are 2 bytes per entry, while only 3 bits are actively
used of them.  If we invent next flag to pass we can use unused bits
however we need a way to indicate to plugin that the bit is defined.
This could be done by a simple version byte at the beggining of
ext_symtab section which will be 0 now and once we define extra bits we
bump it up to 1.


I like the suggested change, it can help us in the future.



It is not that important given that even empty file results in 2k LTO
object file, but I think it would be nicer in longer run.

+  /* This is for compatibility with older ABIs.  */

Perhaps say here that this ABI defined only "int def;"


Good point.



The patch look good to me. Thanks for the work!


Thanks. I'm sending updated patch that I've just tested on lto.exp and
both binutils master and HJ's branch that utilizes the new API.

Martin


Honza

+#ifdef __BIG_ENDIAN__
+  char unused;
+  char section_flags;
+  char symbol_type;
+  char def;
+#else
+  char def;
+  char symbol_type;
+  char section_flags;
+  char unused;
+#endif
int visibility;
uint64_t size;
char *comdat_key;
@@ -123,6 +134,20 @@ enum ld_plugin_symbol_visibility
LDPV_HIDDEN
  };
  
+/* The type of the symbol.  */

+
+enum ld_plugin_symbol_type
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE,
+};
+
+enum ld_plugin_symbol_section_flags
+{
+  LDSSS_BSS = 1
+};
+
  /* How a symbol is resolved.  */
  
  enum ld_plugin_symbol_resolution

@@ -431,7 +456,9 @@ enum ld_plugin_tag
LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
LDPT_GET_INPUT_SECTION_SIZE = 30,
LDPT_REGISTER_NEW_INPUT_HOOK = 31,
-  LDPT_GET_WRAP_SYMBOLS = 32
+  LDPT_GET_WRAP_SYMBOLS = 32,
+  LDPT_ADD_SYMBOLS_V2 = 33,
+  LDPT_GET_SYMBOLS_V4 = 34,
  };
  
  /* The plugin transfer vector.  */

--
2.25.1





>From 492e7dc5b5f792b2e9f92b5fc77e47fe9ee98da7 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH 2/3] API extension for binutils (type of symbols).


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-17 Thread Jan Hubicka
> Hi.
> 
> There's updated version of the patch.
> Changes from the previous version:
> - comment added to ld_plugin_symbol
> - new section renamed to ext_symtab
> - assert added for loop iterations in produce_symtab and 
> produce_symtab_extension
Hi,
I hope this is last version of the patch.
> 
> 2020-03-12  Martin Liska  
> 
>   * lto-section-in.c: Add extension_symtab.
ext_symtab  :)
> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> index c17dd69dbdd..78b015be696 100644
> --- a/gcc/lto-section-in.c
> +++ b/gcc/lto-section-in.c
> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
>"mode_table",
>"hsa",
>"lto",
> -  "ipa_sra"
> +  "ipa_sra",
> +  "ext_symtab"
I would move ext_symtab next to symtab so the sections remains at least
bit reasonably ordered.
>  
> +/* Write extension information for symbols (symbol type, section flags).  */
> +
> +static void
> +write_symbol_extension_info (tree t)
> +{
> +  unsigned char c;
Do we still use vertical whitespace after decls per GNU coding style?
> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> index 25bf6c468f7..4f82b439360 100644
> --- a/gcc/lto-streamer.h
> +++ b/gcc/lto-streamer.h
> @@ -236,6 +236,7 @@ enum lto_section_type
>LTO_section_ipa_hsa,
>LTO_section_lto,
>LTO_section_ipa_sra,
> +  LTO_section_symtab_extension,
I guess symtab_ext to match the actual section name?
>LTO_N_SECTION_TYPES/* Must be last.  */
>  };
>  
> diff --git a/include/lto-symtab.h b/include/lto-symtab.h
> index 0ce0de10121..47f0ff27df8 100644
> --- a/include/lto-symtab.h
> +++ b/include/lto-symtab.h
> @@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
>  GCCPV_HIDDEN
>};
>  
> +enum gcc_plugin_symbol_type
> +{
> +  GCCST_UNKNOWN,
> +  GCCST_FUNCTION,
> +  GCCST_VARIABLE,
> +};
> +
> +enum gcc_plugin_symbol_section_flags
> +{
> +  GCCSSS_BSS = 1
> +};

Probably comments here?
> +
>  #endif /* GCC_LTO_SYMTAB_H  */
> +/* Parse an entry of the IL symbol table. The data to be parsed is pointed
> +   by P and the result is written in ENTRY. The slot number is stored in 
> SLOT.
> +   Returns the address of the next entry. */
> +
> +static char *
> +parse_table_entry_extension (char *p, struct ld_plugin_symbol *entry)
> +{
> +  unsigned char t;
> +  enum ld_plugin_symbol_type symbol_types[] =
> +{
> +  LDST_UNKNOWN,
> +  LDST_FUNCTION,
> +  LDST_VARIABLE,
> +};
> +
> +  t = *p;
> +  check (t <= 3, LDPL_FATAL, "invalid symbol type found");
> +  entry->symbol_type = symbol_types[t];
> +  p++;
> +  entry->section_flags = *p;
> +  p++;
> +
> +  return p;
> +}

I think we have chance to make some plan for future extensions without
introducing too many additional sections.

Currently there are 2 bytes per entry, while only 3 bits are actively
used of them.  If we invent next flag to pass we can use unused bits
however we need a way to indicate to plugin that the bit is defined.
This could be done by a simple version byte at the beggining of
ext_symtab section which will be 0 now and once we define extra bits we
bump it up to 1.

It is not that important given that even empty file results in 2k LTO
object file, but I think it would be nicer in longer run.
> +  /* This is for compatibility with older ABIs.  */
Perhaps say here that this ABI defined only "int def;"

The patch look good to me. Thanks for the work!
Honza
> +#ifdef __BIG_ENDIAN__
> +  char unused;
> +  char section_flags;
> +  char symbol_type;
> +  char def;
> +#else
> +  char def;
> +  char symbol_type;
> +  char section_flags;
> +  char unused;
> +#endif
>int visibility;
>uint64_t size;
>char *comdat_key;
> @@ -123,6 +134,20 @@ enum ld_plugin_symbol_visibility
>LDPV_HIDDEN
>  };
>  
> +/* The type of the symbol.  */
> +
> +enum ld_plugin_symbol_type
> +{
> +  LDST_UNKNOWN,
> +  LDST_FUNCTION,
> +  LDST_VARIABLE,
> +};
> +
> +enum ld_plugin_symbol_section_flags
> +{
> +  LDSSS_BSS = 1
> +};
> +
>  /* How a symbol is resolved.  */
>  
>  enum ld_plugin_symbol_resolution
> @@ -431,7 +456,9 @@ enum ld_plugin_tag
>LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
>LDPT_GET_INPUT_SECTION_SIZE = 30,
>LDPT_REGISTER_NEW_INPUT_HOOK = 31,
> -  LDPT_GET_WRAP_SYMBOLS = 32
> +  LDPT_GET_WRAP_SYMBOLS = 32,
> +  LDPT_ADD_SYMBOLS_V2 = 33,
> +  LDPT_GET_SYMBOLS_V4 = 34,
>  };
>  
>  /* The plugin transfer vector.  */
> -- 
> 2.25.1
> 



Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-16 Thread Martin Liška

On 3/16/20 3:34 PM, H.J. Lu wrote:

It is LDPT_ADD_SYMBOLS_V2, not LDPT_GET_SYMBOLS_V2.


Sorry. It's fixed now.

Martin


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-16 Thread H.J. Lu via Gcc-patches
On Mon, Mar 16, 2020 at 6:50 AM Martin Liška  wrote:
>
> On 3/16/20 12:12 PM, H.J. Lu wrote:
> > (enum ld_plugin_tag): Add LDPT_GET_SYMBOLS_V and
> > ^^^ Typo,
>
> Thank you. I fixed that in my development branch.

Still not right.  It should be

2020-03-12  Martin Liska  

PR binutils/25640
* plugin-api.h (struct ld_plugin_symbol): Split int def into 4
char fields.
(enum ld_plugin_symbol_type): New.
(enum ld_plugin_symbol_section_flags): New.
(enum ld_plugin_tag): Add LDPT_ADD_SYMBOLS_V2 and
LDPT_GET_SYMBOLS_V4.

It is LDPT_ADD_SYMBOLS_V2, not LDPT_GET_SYMBOLS_V2.

-- 
H.J.


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-16 Thread Martin Liška

On 3/16/20 12:12 PM, H.J. Lu wrote:

(enum ld_plugin_tag): Add LDPT_GET_SYMBOLS_V and
^^^ Typo,


Thank you. I fixed that in my development branch.

Martin


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-16 Thread H.J. Lu via Gcc-patches
On Thu, Mar 12, 2020 at 7:54 AM Martin Liška  wrote:
>
> Hi.
>
> There's updated version of the patch.
> Changes from the previous version:
> - comment added to ld_plugin_symbol
> - new section renamed to ext_symtab
> - assert added for loop iterations in produce_symtab and 
> produce_symtab_extension
>
> Martin

2020-03-12  Martin Liska  

* plugin-api.h (struct ld_plugin_symbol): Split
int def into 4 char fields.
(enum ld_plugin_symbol_type): New.
(enum ld_plugin_symbol_section_flags): New.
(enum ld_plugin_tag): Add LDPT_GET_SYMBOLS_V and
   ^^^ Typo,
should be LDPT_ADD_SYMBOLS_V2.
LDPT_GET_SYMBOLS_V4.


-- 
H.J.


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-12 Thread Martin Liška

Hi.

There's updated version of the patch.
Changes from the previous version:
- comment added to ld_plugin_symbol
- new section renamed to ext_symtab
- assert added for loop iterations in produce_symtab and 
produce_symtab_extension

Martin
>From ad24565cfb3166fdd9995381b25c1f558c7bcf8c Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH 2/2] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-12  Martin Liska  

	* lto-section-in.c: Add extension_symtab.
	* lto-streamer-out.c (write_symbol_extension_info):
	New.
	(produce_symtab_extension): New.
	(produce_asm_for_decls): Stream also produce_symtab_extension.
	* lto-streamer.h (enum lto_section_type): New section.

include/ChangeLog:

2020-03-12  Martin Liska  

	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
	(enum gcc_plugin_symbol_section_flags): Likewise.

lto-plugin/ChangeLog:

2020-03-12  Martin Liska  

	* lto-plugin.c (LTO_SECTION_PREFIX): Rename to
	...
	(LTO_SYMTAB_PREFIX): ... this.
	(LTO_SECTION_PREFIX_LEN): Rename to ...
	(LTO_SYMTAB_PREFIX_LEN): ... this.
	(LTO_SYMTAB_EXT_PREFIX): New.
	(LTO_SYMTAB_EXT_PREFIX_LEN): New.
	(LTO_LTO_PREFIX): New.
	(LTO_LTO_PREFIX_LEN): New.
	(parse_table_entry): Fill up unused to zero.
	(parse_table_entry_extension): New.
	(parse_symtab_extension): New.
	(finish_conflict_resolution): Change type
	for resolution.
	(clear_new_symtab_flags): New.
	(write_resolution): Support new get_symbols_v4.
	(process_symtab): Use new macro name.
	(process_symtab_extension): New.
	(claim_file_handler): Parse also process_symtab_extension.
	(onload): Call new add_symbols_v2.
---
 gcc/lto-section-in.c|   3 +-
 gcc/lto-streamer-out.c  |  76 +-
 gcc/lto-streamer.h  |   1 +
 include/lto-symtab.h|  12 +++
 lto-plugin/lto-plugin.c | 165 +++-
 5 files changed, 237 insertions(+), 20 deletions(-)

diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index c17dd69dbdd..78b015be696 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
   "mode_table",
   "hsa",
   "lto",
-  "ipa_sra"
+  "ipa_sra",
+  "ext_symtab"
 };
 
 /* Hooks so that the ipa passes can call into the lto front end to get
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..e459ec91856 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2777,12 +2778,32 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (_num, 4);
 }
 
+/* Write extension information for symbols (symbol type, section flags).  */
+
+static void
+write_symbol_extension_info (tree t)
+{
+  unsigned char c;
+  c = ((unsigned char) TREE_CODE (t) == VAR_DECL
+   ? GCCST_VARIABLE : GCCST_FUNCTION);
+  lto_write_data (, 1);
+  unsigned char section_flags = 0;
+  if (TREE_CODE (t) == VAR_DECL)
+{
+  section *s = get_variable_section (t, false);
+  if (s->common.flags & SECTION_BSS)
+	section_flags |= GCCSSS_BSS;
+}
+  lto_write_data (_flags, 1);
+}
+
 /* Write an IL symbol table to OB.
SET and VSET are cgraph/varpool node sets we are outputting.  */
 
-static void
+static unsigned int
 produce_symtab (struct output_block *ob)
 {
+  unsigned int streamed_symbols = 0;
   struct streamer_tree_cache_d *cache = ob->writer_cache;
   char *section_name = lto_get_section_name (LTO_section_symtab, NULL, 0, NULL);
   lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
@@ -2804,6 +2825,7 @@ produce_symtab (struct output_block *ob)
   if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
 	continue;
   write_symbol (cache, node->decl, , false);
+  ++streamed_symbols;
 }
   for (lsei = lsei_start (encoder);
!lsei_end_p (lsei); lsei_next ())
@@ -2813,8 +2835,55 @@ produce_symtab (struct output_block *ob)
   if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
 	continue;
   write_symbol (cache, node->decl, , false);
+  ++streamed_symbols;
+}
+
+  lto_end_section ();
+
+  return streamed_symbols;
+}
+
+/* Write an IL symbol table extension to OB.
+   SET and VSET are cgraph/varpool node sets we are outputting.  */
+
+static void
+produce_symtab_extension (struct output_block *ob,
+			  unsigned int previous_streamed_symbols)
+{
+  unsigned int streamed_symbols = 0;
+  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
+	 NULL, 0, NULL);
+  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
+  lto_symtab_encoder_iterator lsei;
+
+  lto_begin_section (section_name, false);
+  free (section_name);
+
+  /* Write the symbol table.
+ 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-12 Thread Martin Liška

On 3/12/20 2:15 PM, Richard Biener wrote:

#elif defined __LITTLE_ENDIAN__


Hmm, the macro is not defined. Even a simple test-case shows that:

$ cat le.c
#ifdef __LITTLE_ENDIAN__
asdfa
#endif

$ gcc le.c -c
[no error]


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-12 Thread Richard Biener via Gcc-patches
On Thu, Mar 12, 2020 at 2:11 PM Martin Liška  wrote:
>
> On 3/12/20 1:55 PM, Jan Hubicka wrote:
> >> gcc/ChangeLog:
> >>
> >> 2020-03-12  Martin Liska  
> >>
> >>  * lto-section-in.c: Add extension_symtab.
> >>  * lto-streamer-out.c (write_symbol_extension_info):
> >>  New.
> >>  (produce_symtab_extension): New.
> >>  (produce_asm_for_decls): Stream also produce_symtab_extension.
> >>  * lto-streamer.h (enum lto_section_type): New section.
> >>
> >> include/ChangeLog:
> >>
> >> 2020-03-12  Martin Liska  
> >>
> >>  * lto-symtab.h (enum gcc_plugin_symbol_type): New.
> >>  (enum gcc_plugin_symbol_section_flags): Likewise.
> >>
> >> lto-plugin/ChangeLog:
> >>
> >> 2020-03-12  Martin Liska  
> >>
> >>  * lto-plugin.c (LTO_SECTION_PREFIX): Rename to
> >>  ...
> >>  (LTO_SYMTAB_PREFIX): ... this.
> >>  (LTO_SECTION_PREFIX_LEN): Rename to ...
> >>  (LTO_SYMTAB_PREFIX_LEN): ... this.
> >>  (LTO_SYMTAB_EXT_PREFIX): New.
> >>  (LTO_SYMTAB_EXT_PREFIX_LEN): New.
> >>  (LTO_LTO_PREFIX): New.
> >>  (LTO_LTO_PREFIX_LEN): New.
> >>  (parse_table_entry): Fill up unused to zero.
> >>  (parse_table_entry_extension): New.
> >>  (parse_symtab_extension): New.
> >>  (finish_conflict_resolution): Change type
> >>  for resolution.
> >>  (clear_new_symtab_flags): New.
> >>  (write_resolution): Support new get_symbols_v4.
> >>  (process_symtab): Use new macro name.
> >>  (process_symtab_extension): New.
> >>  (claim_file_handler): Parse also process_symtab_extension.
> >>  (onload): Call new add_symbols_v2.
> > Hi,
> > thanks for updating the patch.  Two minor nits
> >> ---
> >>   gcc/lto-section-in.c|   3 +-
> >>   gcc/lto-streamer-out.c  |  64 +++-
> >>   gcc/lto-streamer.h  |   1 +
> >>   include/lto-symtab.h|  12 +++
> >>   lto-plugin/lto-plugin.c | 165 +++-
> >>   5 files changed, 226 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> >> index c17dd69dbdd..7bf59c513fc 100644
> >> --- a/gcc/lto-section-in.c
> >> +++ b/gcc/lto-section-in.c
> >> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
> >> "mode_table",
> >> "hsa",
> >> "lto",
> >> -  "ipa_sra"
> >> +  "ipa_sra",
> >> +  "extension_symtab"
> > I would call it symtab_ext so alphabetically it is coupled with symtab.

Maybe ext_symtab then, proper enlish for the long form would be
extended_symtab I guess.

> It's problematic because of this collision where we search for prefix:
>
> if (strncmp (name, LTO_SYMTAB_PREFIX, LTO_SYMTAB_PREFIX_LEN) != 0)
>
> Note that the section names have suffix:
> .gnu.lto_.symtab.48f1e54b1c048b0f
>
> > I wonder if moving it up in the enum would make it physically appear
> > next to .symtab in object file so we save a bit of i/o?
>
> It's not about names order, but streaming order. Which should be fine:
>
> $ readelf -SW /tmp/bss.o
> ...
>[10] .gnu.lto_.decls.48f1e54b1c048b0f PROGBITS 
> 0001dd 0002de 00   E  0   0  1
>[11] .gnu.lto_.symtab.48f1e54b1c048b0f PROGBITS 
> 0004bb 49 00   E  0   0  1
>[12] .gnu.lto_.extension_symtab.48f1e54b1c048b0f PROGBITS
>  000504 06 00   E  0   0  1
>[13] .gnu.lto_.optsPROGBITS 00050a 51 00   
> E  0   0  1
> ...
>
>
> >> +static void
> >> +produce_symtab_extension (struct output_block *ob)
> >> +{
> >> +  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
> >> + NULL, 0, NULL);
> >> +  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
> >> +  lto_symtab_encoder_iterator lsei;
> >> +
> >> +  lto_begin_section (section_name, false);
> >> +  free (section_name);
> >> +
> >> +  /* Write the symbol table.
> >> + First write everything defined and then all declarations.
> >> + This is necessary to handle cases where we have duplicated symbols.  
> >> */
> >> +  for (lsei = lsei_start (encoder);
> >> +   !lsei_end_p (lsei); lsei_next ())
> >> +{
> >> +  symtab_node *node = lsei_node (lsei);
> >> +
> >> +  if (DECL_EXTERNAL (node->decl) || 
> >> !node->output_to_lto_symbol_table_p ())
> >> +continue;
> >> +  write_symbol_extension_info (node->decl);
> >> +}
> >> +  for (lsei = lsei_start (encoder);
> >> +   !lsei_end_p (lsei); lsei_next ())
> >> +{
> >> +  symtab_node *node = lsei_node (lsei);
> >> +
> >> +  if (!DECL_EXTERNAL (node->decl) || 
> >> !node->output_to_lto_symbol_table_p ())
> >> +continue;
> >> +  write_symbol_extension_info (node->decl);
> >> +}
> >> +
> >> +  lto_end_section ();
> >> +}
> >
> > It seems fragile to me to duplicate the loops that needs to be kept in
> > sync because breaking that will likely get bit suprising outcome (i.e.
> > bootstrap will work since we do not care 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-12 Thread Martin Liška

On 3/12/20 1:55 PM, Jan Hubicka wrote:

gcc/ChangeLog:

2020-03-12  Martin Liska  

* lto-section-in.c: Add extension_symtab.
* lto-streamer-out.c (write_symbol_extension_info):
New.
(produce_symtab_extension): New.
(produce_asm_for_decls): Stream also produce_symtab_extension.
* lto-streamer.h (enum lto_section_type): New section.

include/ChangeLog:

2020-03-12  Martin Liska  

* lto-symtab.h (enum gcc_plugin_symbol_type): New.
(enum gcc_plugin_symbol_section_flags): Likewise.

lto-plugin/ChangeLog:

2020-03-12  Martin Liska  

* lto-plugin.c (LTO_SECTION_PREFIX): Rename to
...
(LTO_SYMTAB_PREFIX): ... this.
(LTO_SECTION_PREFIX_LEN): Rename to ...
(LTO_SYMTAB_PREFIX_LEN): ... this.
(LTO_SYMTAB_EXT_PREFIX): New.
(LTO_SYMTAB_EXT_PREFIX_LEN): New.
(LTO_LTO_PREFIX): New.
(LTO_LTO_PREFIX_LEN): New.
(parse_table_entry): Fill up unused to zero.
(parse_table_entry_extension): New.
(parse_symtab_extension): New.
(finish_conflict_resolution): Change type
for resolution.
(clear_new_symtab_flags): New.
(write_resolution): Support new get_symbols_v4.
(process_symtab): Use new macro name.
(process_symtab_extension): New.
(claim_file_handler): Parse also process_symtab_extension.
(onload): Call new add_symbols_v2.

Hi,
thanks for updating the patch.  Two minor nits

---
  gcc/lto-section-in.c|   3 +-
  gcc/lto-streamer-out.c  |  64 +++-
  gcc/lto-streamer.h  |   1 +
  include/lto-symtab.h|  12 +++
  lto-plugin/lto-plugin.c | 165 +++-
  5 files changed, 226 insertions(+), 19 deletions(-)

diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index c17dd69dbdd..7bf59c513fc 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
"mode_table",
"hsa",
"lto",
-  "ipa_sra"
+  "ipa_sra",
+  "extension_symtab"

I would call it symtab_ext so alphabetically it is coupled with symtab.


It's problematic because of this collision where we search for prefix:

if (strncmp (name, LTO_SYMTAB_PREFIX, LTO_SYMTAB_PREFIX_LEN) != 0)

Note that the section names have suffix:
.gnu.lto_.symtab.48f1e54b1c048b0f


I wonder if moving it up in the enum would make it physically appear
next to .symtab in object file so we save a bit of i/o?


It's not about names order, but streaming order. Which should be fine:

$ readelf -SW /tmp/bss.o
...
  [10] .gnu.lto_.decls.48f1e54b1c048b0f PROGBITS 0001dd 
0002de 00   E  0   0  1
  [11] .gnu.lto_.symtab.48f1e54b1c048b0f PROGBITS 
0004bb 49 00   E  0   0  1
  [12] .gnu.lto_.extension_symtab.48f1e54b1c048b0f PROGBITS
 000504 06 00   E  0   0  1
  [13] .gnu.lto_.optsPROGBITS 00050a 51 00   E  
0   0  1
...



+static void
+produce_symtab_extension (struct output_block *ob)
+{
+  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
+NULL, 0, NULL);
+  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
+  lto_symtab_encoder_iterator lsei;
+
+  lto_begin_section (section_name, false);
+  free (section_name);
+
+  /* Write the symbol table.
+ First write everything defined and then all declarations.
+ This is necessary to handle cases where we have duplicated symbols.  */
+  for (lsei = lsei_start (encoder);
+   !lsei_end_p (lsei); lsei_next ())
+{
+  symtab_node *node = lsei_node (lsei);
+
+  if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p ())
+   continue;
+  write_symbol_extension_info (node->decl);
+}
+  for (lsei = lsei_start (encoder);
+   !lsei_end_p (lsei); lsei_next ())
+{
+  symtab_node *node = lsei_node (lsei);
+
+  if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p 
())
+   continue;
+  write_symbol_extension_info (node->decl);
+}
+
+  lto_end_section ();
+}


It seems fragile to me to duplicate the loops that needs to be kept in
sync because breaking that will likely get bit suprising outcome (i.e.
bootstrap will work since we do not care about symbol types). Perhaps it
would be more robust to simply push the extension bits into a vector in
write_symbol for later streaming.  Or at least add comment that loops
needs to be kept in sync.


I'll add comment.


diff --git a/include/plugin-api.h b/include/plugin-api.h
index 09e1202df07..804684449cf 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -87,7 +87,17 @@ struct ld_plugin_symbol
  {
char *name;
char *version;
-  int def;
+#ifdef __BIG_ENDIAN__
+  char unused;
+  char section_flags;
+  char symbol_type;
+  char def;
+#else
+  char def;
+  char symbol_type;
+  char 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-12 Thread Jan Hubicka
> gcc/ChangeLog:
> 
> 2020-03-12  Martin Liska  
> 
>   * lto-section-in.c: Add extension_symtab.
>   * lto-streamer-out.c (write_symbol_extension_info):
>   New.
>   (produce_symtab_extension): New.
>   (produce_asm_for_decls): Stream also produce_symtab_extension.
>   * lto-streamer.h (enum lto_section_type): New section.
> 
> include/ChangeLog:
> 
> 2020-03-12  Martin Liska  
> 
>   * lto-symtab.h (enum gcc_plugin_symbol_type): New.
>   (enum gcc_plugin_symbol_section_flags): Likewise.
> 
> lto-plugin/ChangeLog:
> 
> 2020-03-12  Martin Liska  
> 
>   * lto-plugin.c (LTO_SECTION_PREFIX): Rename to
>   ...
>   (LTO_SYMTAB_PREFIX): ... this.
>   (LTO_SECTION_PREFIX_LEN): Rename to ...
>   (LTO_SYMTAB_PREFIX_LEN): ... this.
>   (LTO_SYMTAB_EXT_PREFIX): New.
>   (LTO_SYMTAB_EXT_PREFIX_LEN): New.
>   (LTO_LTO_PREFIX): New.
>   (LTO_LTO_PREFIX_LEN): New.
>   (parse_table_entry): Fill up unused to zero.
>   (parse_table_entry_extension): New.
>   (parse_symtab_extension): New.
>   (finish_conflict_resolution): Change type
>   for resolution.
>   (clear_new_symtab_flags): New.
>   (write_resolution): Support new get_symbols_v4.
>   (process_symtab): Use new macro name.
>   (process_symtab_extension): New.
>   (claim_file_handler): Parse also process_symtab_extension.
>   (onload): Call new add_symbols_v2.
Hi,
thanks for updating the patch.  Two minor nits
> ---
>  gcc/lto-section-in.c|   3 +-
>  gcc/lto-streamer-out.c  |  64 +++-
>  gcc/lto-streamer.h  |   1 +
>  include/lto-symtab.h|  12 +++
>  lto-plugin/lto-plugin.c | 165 +++-
>  5 files changed, 226 insertions(+), 19 deletions(-)
> 
> diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
> index c17dd69dbdd..7bf59c513fc 100644
> --- a/gcc/lto-section-in.c
> +++ b/gcc/lto-section-in.c
> @@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
>"mode_table",
>"hsa",
>"lto",
> -  "ipa_sra"
> +  "ipa_sra",
> +  "extension_symtab"
I would call it symtab_ext so alphabetically it is coupled with symtab.
I wonder if moving it up in the enum would make it physically appear
next to .symtab in object file so we save a bit of i/o?
> +static void
> +produce_symtab_extension (struct output_block *ob)
> +{
> +  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
> +  NULL, 0, NULL);
> +  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
> +  lto_symtab_encoder_iterator lsei;
> +
> +  lto_begin_section (section_name, false);
> +  free (section_name);
> +
> +  /* Write the symbol table.
> + First write everything defined and then all declarations.
> + This is necessary to handle cases where we have duplicated symbols.  */
> +  for (lsei = lsei_start (encoder);
> +   !lsei_end_p (lsei); lsei_next ())
> +{
> +  symtab_node *node = lsei_node (lsei);
> +
> +  if (DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p 
> ())
> + continue;
> +  write_symbol_extension_info (node->decl);
> +}
> +  for (lsei = lsei_start (encoder);
> +   !lsei_end_p (lsei); lsei_next ())
> +{
> +  symtab_node *node = lsei_node (lsei);
> +
> +  if (!DECL_EXTERNAL (node->decl) || !node->output_to_lto_symbol_table_p 
> ())
> + continue;
> +  write_symbol_extension_info (node->decl);
> +}
> +
> +  lto_end_section ();
> +}

It seems fragile to me to duplicate the loops that needs to be kept in
sync because breaking that will likely get bit suprising outcome (i.e.
bootstrap will work since we do not care about symbol types). Perhaps it
would be more robust to simply push the extension bits into a vector in
write_symbol for later streaming.  Or at least add comment that loops
needs to be kept in sync.
> diff --git a/include/plugin-api.h b/include/plugin-api.h
> index 09e1202df07..804684449cf 100644
> --- a/include/plugin-api.h
> +++ b/include/plugin-api.h
> @@ -87,7 +87,17 @@ struct ld_plugin_symbol
>  {
>char *name;
>char *version;
> -  int def;
> +#ifdef __BIG_ENDIAN__
> +  char unused;
> +  char section_flags;
> +  char symbol_type;
> +  char def;
> +#else
> +  char def;
> +  char symbol_type;
> +  char section_flags;
> +  char unused;
> +#endif
Perhaps at least drop a comment why this is done this way (i.e.
compatibility with V3 version of API) so in 10 years we know?

Bit more systematic way would be to add a new hook to query extended
properties of a given symbol. I.e. something like
void *get_symbol_property (symbol, enum property)

Honza


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-12 Thread Martin Liška

Hi.

I'm sending extended version of the patch where I address Honza's note
about backward compatibility. So I add a next section _lto.extension_symtab
where I put new bits. With that, new LTO bytecode can be opened with older
LTO plugin.

Moreover, I've also tested lto.exp tests and binutils master. And I can confirm
that patches work with H.J.'s counterpart:
https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/plugin/master

$ cat bss.c
int global_zero;
int global_one = 1;

int main() { return 0; }

$ gcc -c -flto bss.c

Old bintuils (with new plugin):

$ ./binutils/nm-new /tmp/bss.o --plugin 
/home/marxin/Programming/gcc2/objdir/gcc/liblto_plugin.so.0.0.0
 T global_one
 T global_zero
 T main

H.J.'s binutils (with new plugin):

$ ./binutils/nm-new /tmp/bss.o --plugin 
/home/marxin/Programming/gcc2/objdir/gcc/liblto_plugin.so.0.0.0
 D global_one
 B global_zero
 T main

System binutils (with old plugin):

$ nm /tmp/bss.o
 T global_one
 T global_zero
 T main

Hope I'm done now.
Thoughts?

Thanks,
Martin
>From 8968bbdf627dc26056f1cb344e253eedae0efca1 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH 2/2] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-12  Martin Liska  

	* lto-section-in.c: Add extension_symtab.
	* lto-streamer-out.c (write_symbol_extension_info):
	New.
	(produce_symtab_extension): New.
	(produce_asm_for_decls): Stream also produce_symtab_extension.
	* lto-streamer.h (enum lto_section_type): New section.

include/ChangeLog:

2020-03-12  Martin Liska  

	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
	(enum gcc_plugin_symbol_section_flags): Likewise.

lto-plugin/ChangeLog:

2020-03-12  Martin Liska  

	* lto-plugin.c (LTO_SECTION_PREFIX): Rename to
	...
	(LTO_SYMTAB_PREFIX): ... this.
	(LTO_SECTION_PREFIX_LEN): Rename to ...
	(LTO_SYMTAB_PREFIX_LEN): ... this.
	(LTO_SYMTAB_EXT_PREFIX): New.
	(LTO_SYMTAB_EXT_PREFIX_LEN): New.
	(LTO_LTO_PREFIX): New.
	(LTO_LTO_PREFIX_LEN): New.
	(parse_table_entry): Fill up unused to zero.
	(parse_table_entry_extension): New.
	(parse_symtab_extension): New.
	(finish_conflict_resolution): Change type
	for resolution.
	(clear_new_symtab_flags): New.
	(write_resolution): Support new get_symbols_v4.
	(process_symtab): Use new macro name.
	(process_symtab_extension): New.
	(claim_file_handler): Parse also process_symtab_extension.
	(onload): Call new add_symbols_v2.
---
 gcc/lto-section-in.c|   3 +-
 gcc/lto-streamer-out.c  |  64 +++-
 gcc/lto-streamer.h  |   1 +
 include/lto-symtab.h|  12 +++
 lto-plugin/lto-plugin.c | 165 +++-
 5 files changed, 226 insertions(+), 19 deletions(-)

diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index c17dd69dbdd..7bf59c513fc 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -54,7 +54,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
   "mode_table",
   "hsa",
   "lto",
-  "ipa_sra"
+  "ipa_sra",
+  "extension_symtab"
 };
 
 /* Hooks so that the ipa passes can call into the lto front end to get
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..2fd20252ba5 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2777,6 +2778,25 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (_num, 4);
 }
 
+/* Write extension information for symbols (symbol type, section flags).  */
+
+static void
+write_symbol_extension_info (tree t)
+{
+  unsigned char c;
+  c = ((unsigned char) TREE_CODE (t) == VAR_DECL
+   ? GCCST_VARIABLE : GCCST_FUNCTION);
+  lto_write_data (, 1);
+  unsigned char section_flags = 0;
+  if (TREE_CODE (t) == VAR_DECL)
+{
+  section *s = get_variable_section (t, false);
+  if (s->common.flags & SECTION_BSS)
+	section_flags |= GCCSSS_BSS;
+}
+  lto_write_data (_flags, 1);
+}
+
 /* Write an IL symbol table to OB.
SET and VSET are cgraph/varpool node sets we are outputting.  */
 
@@ -2818,6 +2838,45 @@ produce_symtab (struct output_block *ob)
   lto_end_section ();
 }
 
+/* Write an IL symbol table extension to OB.
+   SET and VSET are cgraph/varpool node sets we are outputting.  */
+
+static void
+produce_symtab_extension (struct output_block *ob)
+{
+  char *section_name = lto_get_section_name (LTO_section_symtab_extension,
+	 NULL, 0, NULL);
+  lto_symtab_encoder_t encoder = ob->decl_state->symtab_node_encoder;
+  lto_symtab_encoder_iterator lsei;
+
+  lto_begin_section (section_name, false);
+  free (section_name);
+
+  /* Write the symbol table.
+ First write everything defined and then all declarations.
+ This is necessary to 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-11 Thread Alexander Monakov via Gcc-patches
On Wed, 11 Mar 2020, Martin Liška wrote:

> > Is there a comprehensive list of plugins out in the wild using the LD
> > plugin API?
> 
> I know only about:
> $ ls /usr/lib/bfd-plugins
> liblto_plugin.so.0.0.0  LLVMgold.so
> 
> and I know about Alexander Monakov (some dead code elimination plug-in).

I don't recall seeing any other plugin when working on our "debloating"
stuff, but of course I suppose people could have used the plugin API
for experimentation.  I don't think a comprehensive list could exist.

> > Note we also have to bring in gold folks (not sure if lld also
> > implements the same plugin API)
> 
> I don't see how can they be affected?

As discussed downthread the other linkers could freely ignore the
new callback, but fwiw LLD has no plans to implement the plugin-api.h
interface: https://bugs.llvm.org/show_bug.cgi?id=42446

Alexander


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-11 Thread Martin Liška

On 3/11/20 1:51 PM, Richard Biener wrote:

Splitting an existing field isn't hackish IMHO.  I guess even
explicitely changing it
to one short and two char fields would be OK.


Good, I'm doing that in the updated version of that patch.
H.J. is right now working on the corresponding binutils counterpart.

Thoughts?
Martin
>From 8a2e73edb53f26a1b6a543e2e2027b2661a48f94 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH] API extension for binutils (type of symbols).

---
 gcc/lto-streamer-out.c  | 12 +
 include/lto-symtab.h| 12 +
 include/plugin-api.h| 30 -
 lto-plugin/lto-plugin.c | 99 +
 4 files changed, 142 insertions(+), 11 deletions(-)

diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..b212be35b3d 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2773,6 +2774,17 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (, 1);
   c = (unsigned char) visibility;
   lto_write_data (, 1);
+  c = ((unsigned char) TREE_CODE (t) == VAR_DECL
+   ? GCCST_VARIABLE : GCCST_FUNCTION);
+  lto_write_data (, 1);
+  unsigned char section_flags = 0;
+  if (TREE_CODE (t) == VAR_DECL)
+{
+  section *s = get_variable_section (t, false);
+  if (s->common.flags & SECTION_BSS)
+	section_flags |= GCCSSS_BSS;
+}
+  lto_write_data (_flags, 1);
   lto_write_data (, 8);
   lto_write_data (_num, 4);
 }
diff --git a/include/lto-symtab.h b/include/lto-symtab.h
index 0ce0de10121..47f0ff27df8 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,16 @@ enum gcc_plugin_symbol_visibility
 GCCPV_HIDDEN
   };
 
+enum gcc_plugin_symbol_type
+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE,
+};
+
+enum gcc_plugin_symbol_section_flags
+{
+  GCCSSS_BSS = 1
+};
+
 #endif /* GCC_LTO_SYMTAB_H  */
diff --git a/include/plugin-api.h b/include/plugin-api.h
index 09e1202df07..804684449cf 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -87,7 +87,17 @@ struct ld_plugin_symbol
 {
   char *name;
   char *version;
-  int def;
+#ifdef __BIG_ENDIAN__
+  char unused;
+  char section_flags;
+  char symbol_type;
+  char def;
+#else
+  char def;
+  char symbol_type;
+  char section_flags;
+  char unused;
+#endif
   int visibility;
   uint64_t size;
   char *comdat_key;
@@ -123,6 +133,20 @@ enum ld_plugin_symbol_visibility
   LDPV_HIDDEN
 };
 
+/* The type of the symbol.  */
+
+enum ld_plugin_symbol_type
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE,
+};
+
+enum ld_plugin_symbol_section_flags
+{
+  LDSSS_BSS = 1
+};
+
 /* How a symbol is resolved.  */
 
 enum ld_plugin_symbol_resolution
@@ -431,7 +455,9 @@ enum ld_plugin_tag
   LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
   LDPT_GET_INPUT_SECTION_SIZE = 30,
   LDPT_REGISTER_NEW_INPUT_HOOK = 31,
-  LDPT_GET_WRAP_SYMBOLS = 32
+  LDPT_GET_WRAP_SYMBOLS = 32,
+  LDPT_ADD_SYMBOLS_V2 = 33,
+  LDPT_GET_SYMBOLS_V4 = 34,
 };
 
 /* The plugin transfer vector.  */
diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index c307fc871bf..bce9f183049 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -90,6 +90,8 @@ along with this program; see the file COPYING3.  If not see
 
 #define LTO_SECTION_PREFIX	".gnu.lto_.symtab"
 #define LTO_SECTION_PREFIX_LEN	(sizeof (LTO_SECTION_PREFIX) - 1)
+#define LTO_LTO_PREFIX		".gnu.lto_.lto"
+#define LTO_LTO_PREFIX_LEN	(sizeof (LTO_LTO_PREFIX) - 1)
 #define OFFLOAD_SECTION		".gnu.offload_lto_.opts"
 #define OFFLOAD_SECTION_LEN	(sizeof (OFFLOAD_SECTION) - 1)
 
@@ -154,12 +156,12 @@ enum symbol_style
 static char *arguments_file_name;
 static ld_plugin_register_claim_file register_claim_file;
 static ld_plugin_register_all_symbols_read register_all_symbols_read;
-static ld_plugin_get_symbols get_symbols, get_symbols_v2;
+static ld_plugin_get_symbols get_symbols, get_symbols_v2, get_symbols_v4;
 static ld_plugin_register_cleanup register_cleanup;
 static ld_plugin_add_input_file add_input_file;
 static ld_plugin_add_input_library add_input_library;
 static ld_plugin_message message;
-static ld_plugin_add_symbols add_symbols;
+static ld_plugin_add_symbols add_symbols, add_symbols_v2;
 
 static struct plugin_file_info *claimed_files = NULL;
 static unsigned int num_claimed_files = 0;
@@ -221,6 +223,20 @@ check_1 (int gate, enum ld_plugin_level level, const char *text)
 }
 }
 
+/* Structure that represents LTO ELF section with information
+   about the format.  */
+
+struct lto_section
+ {
+   int16_t major_version;
+   int16_t minor_version;
+   unsigned char slim_object: 1;
+   unsigned char compression: 4;
+   int32_t reserved0: 27;
+};
+
+struct 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-11 Thread Martin Liška

On 3/11/20 2:24 PM, H.J. Lu wrote:

ld-new don't have to use the new interface since it isn't needed.


Yeah. We talk about nm that should utilize the new API, right?

Martin


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-11 Thread H.J. Lu via Gcc-patches
On Wed, Mar 11, 2020 at 6:09 AM Martin Liška  wrote:
>
> On 3/11/20 1:51 PM, Richard Biener wrote:
> > I'm not sure I understand the versioning, we should aim at something where
> > an updated plugin can talk to old and new ld and where a new ld can also 
> > talk
> > to an old plugin.  That requires an arbitration which I don't see 
> > implemented?
>
> So ld-new will set new callbacks tv_add_symbols_v2 and tv_get_symbols_v4, 
> these contain

ld-new don't have to use the new interface since it isn't needed.


-- 
H.J.


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-11 Thread Martin Liška

On 3/11/20 1:51 PM, Richard Biener wrote:

I'm not sure I understand the versioning, we should aim at something where
an updated plugin can talk to old and new ld and where a new ld can also talk
to an old plugin.  That requires an arbitration which I don't see implemented?


So ld-new will set new callbacks tv_add_symbols_v2 and tv_get_symbols_v4, these 
contain
ld_plugin_symbol_v2 with filled ::symbol_type. ld-new will check for he new API 
with
LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2 and will tv_add_symbols and 
tv_get_symbols callbacks.

ld-old will use the old tv_add_symbols and tv_get_symbols callback.

One another level of compatibility is the LTO bytecode, where new plugin should
handle also old LTO bytecode. That's done with parsing of LTO bytecode version
(lto_section).



Splitting an existing field isn't hackish IMHO.  I guess even
explicitely changing it
to one short and two char fields would be OK.


Then I'm fine with that as well. Am I right that the split order
will depend of the endianess?



Is there a comprehensive list of plugins out in the wild using the LD
plugin API?


I know only about:
$ ls /usr/lib/bfd-plugins
liblto_plugin.so.0.0.0  LLVMgold.so

and I know about Alexander Monakov (some dead code elimination plug-in).



Note we also have to bring in gold folks (not sure if lld also
implements the same
plugin API)


I don't see how can they be affected?

Martin




Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-11 Thread Richard Biener via Gcc-patches
On Wed, Mar 11, 2020 at 1:22 PM Martin Liška  wrote:
>
> On 3/11/20 11:22 AM, Richard Biener wrote:
> > On Wed, Mar 11, 2020 at 10:19 AM Martin Liška  wrote:
> >>
> >> On 3/10/20 1:07 PM, Martin Liška wrote:
> >>> On 3/10/20 12:24 PM, Richard Biener wrote:
>  Not sure how symtab is encoded right now but we also could have
> >>>
> >>> Ok, right now I don't see symtab entry much extensible.
> >>>
> >>> But what am I suggesting is to parse LTO bytecode version and then
> >>> process conditional parsing of lto_symtab section.
> >>>
> >>> Thoughts?
> >>> Martin
> >>
> >> So as H.J. correctly pointed I can't add new symbol_type into 
> >> ld_plugin_symbol struct.
> >> It would make ABI change as correctly identified by abidiff:
> >>
> >> abidiff /tmp/before.o /tmp/after.o
> >> Functions changes summary: 0 Removed, 1 Changed, 0 Added function
> >> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
> >> Function symbols changes summary: 0 Removed, 0 Added function symbol not 
> >> referenced by debug info
> >> Variable symbols changes summary: 0 Removed, 1 Added variable symbol not 
> >> referenced by debug info
> >>
> >> 1 function with some indirect sub-type change:
> >>
> >> [C]'function ld_plugin_status onload(ld_plugin_tv*)' at 
> >> lto-plugin.c:1275:1 has some indirect sub-type changes:
> >>   parameter 1 of type 'ld_plugin_tv*' has sub-type changes:
> >> in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1:
> >>   type size hasn't changed
> >>   1 data member changes (1 filtered):
> >>type of 'union {int tv_val; const char* tv_string; 
> >> ld_plugin_register_claim_file tv_register_claim_file; 
> >> ld_plugin_register_all_symbols_read tv_register_all_symbols_read; 
> >> ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols 
> >> tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; 
> >> ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv_message; 
> >> ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view 
> >> tv_get_view; ld_plugin_release_input_file tv_release_input_file; 
> >> ld_plugin_add_input_library tv_add_input_library; 
> >> ld_plugin_set_extra_library_path tv_set_extra_library_path; 
> >> ld_plugin_get_input_section_count tv_get_input_section_count; 
> >> ld_plugin_get_input_section_type tv_get_input_section_type; 
> >> ld_plugin_get_input_section_name tv_get_input_section_name; 
> >> ld_plugin_get_input_section_contents tv_get_input_section_contents; 
> >> ld_plugin_update_section_order tv_update_section_order; 
> >> ld_plugin_allow_section_ordering tv_allow_section_ordering; 
> >> ld_plugin_allow_unique_segment_for_sections 
> >> tv_allow_unique_segment_for_sections; 
> >> ld_plugin_unique_segment_for_sections tv_unique_segment_for_sections; 
> >> ld_plugin_get_input_section_alignment tv_get_input_section_alignment; 
> >> ld_plugin_get_input_section_size tv_get_input_section_size; 
> >> ld_plugin_register_new_input tv_register_new_input; 
> >> ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} ld_plugin_tv::tv_u' 
> >> changed:
> >>  type size hasn't changed
> >>  1 data member changes (1 filtered):
> >>   type of 'ld_plugin_add_symbols tv_add_symbols' changed:
> >> underlying type 'enum ld_plugin_status (void*, int, const 
> >> ld_plugin_symbol*)*' changed:
> >>   in pointed to type 'function type enum ld_plugin_status 
> >> (void*, int, const ld_plugin_symbol*)':
> >> parameter 3 of type 'const ld_plugin_symbol*' has 
> >> sub-type changes:
> >>   in pointed to type 'const ld_plugin_symbol':
> >> in unqualified underlying type 'struct 
> >> ld_plugin_symbol' at plugin-api.h:86:1:
> >>   type size changed from 256 to 288 (in bits)
> >>   1 data member insertion:
> >> 'int ld_plugin_symbol::symbol_type', at offset 
> >> 256 (in bits) at plugin-api.h:95:1
> >>
> >> So that I need to come up with ld_plugin_symbol_v2. It brings more 
> >> challenges: one has 2 parallel symbol
> >> tables:
> >>
> >> struct plugin_symtab
> >> {
> >> ...
> >> struct ld_plugin_symbol_v2 *syms;
> >> struct ld_plugin_symbol *syms_v1;
> >> ...
> >> };
> >>
> >> and the information of these should by aligned.
> >>
> >> The patch can survive lto.exp and I would like to ask H.J. to write 
> >> bintuils counterpart that will
> >> utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2.
> >>
> >> Thoughts?
> >
> > Can't we simply have _V4/V2 use the upper half of
> > ld_plugin_symbol::def?  If the linker
> > then requests _V4 but the plugin cannot cope it could still "use" the
> > data but get
> > LDST_UNKNOWN (zero) there.
>
> Can be possible, but it's hack a bit. The plugin has a mechanisms for 
> versioning
> and this change does not align with the idea.

I'm not sure I understand the versioning, we 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-11 Thread Martin Liška

On 3/11/20 11:22 AM, Richard Biener wrote:

On Wed, Mar 11, 2020 at 10:19 AM Martin Liška  wrote:


On 3/10/20 1:07 PM, Martin Liška wrote:

On 3/10/20 12:24 PM, Richard Biener wrote:

Not sure how symtab is encoded right now but we also could have


Ok, right now I don't see symtab entry much extensible.

But what am I suggesting is to parse LTO bytecode version and then
process conditional parsing of lto_symtab section.

Thoughts?
Martin


So as H.J. correctly pointed I can't add new symbol_type into ld_plugin_symbol 
struct.
It would make ABI change as correctly identified by abidiff:

abidiff /tmp/before.o /tmp/after.o
Functions changes summary: 0 Removed, 1 Changed, 0 Added function
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
Function symbols changes summary: 0 Removed, 0 Added function symbol not 
referenced by debug info
Variable symbols changes summary: 0 Removed, 1 Added variable symbol not 
referenced by debug info

1 function with some indirect sub-type change:

[C]'function ld_plugin_status onload(ld_plugin_tv*)' at lto-plugin.c:1275:1 
has some indirect sub-type changes:
  parameter 1 of type 'ld_plugin_tv*' has sub-type changes:
in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1:
  type size hasn't changed
  1 data member changes (1 filtered):
   type of 'union {int tv_val; const char* tv_string; 
ld_plugin_register_claim_file tv_register_claim_file; 
ld_plugin_register_all_symbols_read tv_register_all_symbols_read; 
ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols 
tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; ld_plugin_add_input_file 
tv_add_input_file; ld_plugin_message tv_message; ld_plugin_get_input_file 
tv_get_input_file; ld_plugin_get_view tv_get_view; ld_plugin_release_input_file 
tv_release_input_file; ld_plugin_add_input_library tv_add_input_library; 
ld_plugin_set_extra_library_path tv_set_extra_library_path; 
ld_plugin_get_input_section_count tv_get_input_section_count; 
ld_plugin_get_input_section_type tv_get_input_section_type; 
ld_plugin_get_input_section_name tv_get_input_section_name; 
ld_plugin_get_input_section_contents tv_get_input_section_contents; 
ld_plugin_update_section_order tv_update_section_order; 
ld_plugin_allow_section_ordering tv_allow_section_ordering; 
ld_plugin_allow_unique_segment_for_sections 
tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections 
tv_unique_segment_for_sections; ld_plugin_get_input_section_alignment 
tv_get_input_section_alignment; ld_plugin_get_input_section_size 
tv_get_input_section_size; ld_plugin_register_new_input tv_register_new_input; 
ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} ld_plugin_tv::tv_u' changed:
 type size hasn't changed
 1 data member changes (1 filtered):
  type of 'ld_plugin_add_symbols tv_add_symbols' changed:
underlying type 'enum ld_plugin_status (void*, int, const 
ld_plugin_symbol*)*' changed:
  in pointed to type 'function type enum ld_plugin_status 
(void*, int, const ld_plugin_symbol*)':
parameter 3 of type 'const ld_plugin_symbol*' has sub-type 
changes:
  in pointed to type 'const ld_plugin_symbol':
in unqualified underlying type 'struct 
ld_plugin_symbol' at plugin-api.h:86:1:
  type size changed from 256 to 288 (in bits)
  1 data member insertion:
'int ld_plugin_symbol::symbol_type', at offset 256 
(in bits) at plugin-api.h:95:1

So that I need to come up with ld_plugin_symbol_v2. It brings more challenges: 
one has 2 parallel symbol
tables:

struct plugin_symtab
{
...
struct ld_plugin_symbol_v2 *syms;
struct ld_plugin_symbol *syms_v1;
...
};

and the information of these should by aligned.

The patch can survive lto.exp and I would like to ask H.J. to write bintuils 
counterpart that will
utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2.

Thoughts?


Can't we simply have _V4/V2 use the upper half of
ld_plugin_symbol::def?  If the linker
then requests _V4 but the plugin cannot cope it could still "use" the
data but get
LDST_UNKNOWN (zero) there.


Can be possible, but it's hack a bit. The plugin has a mechanisms for versioning
and this change does not align with the idea.



IMHO LDST_VARIABLE_BSS is "misplaced"?  "BSS" is the section of the variable.


Yes.


If we want to encode more of ELF it should be LDST_OBJECT and LDST_FUNC.
Note there's also rodata vs data info that would be missing in case
we'd want tools
like readelf -s dump the symbol table of the IL part of an object.  It
looks like
nm can also distinguish rodata from data ("R", "r" vs "d") and "small object"
data sections (not sure what's that about).  It seems nm cannot distinguish
symbols in mergeable string sections (it dumps "R" for me there).  So intead
of mangling everything into enum 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-11 Thread Richard Biener via Gcc-patches
On Wed, Mar 11, 2020 at 11:22 AM Richard Biener
 wrote:
>
> On Wed, Mar 11, 2020 at 10:19 AM Martin Liška  wrote:
> >
> > On 3/10/20 1:07 PM, Martin Liška wrote:
> > > On 3/10/20 12:24 PM, Richard Biener wrote:
> > >> Not sure how symtab is encoded right now but we also could have
> > >
> > > Ok, right now I don't see symtab entry much extensible.
> > >
> > > But what am I suggesting is to parse LTO bytecode version and then
> > > process conditional parsing of lto_symtab section.
> > >
> > > Thoughts?
> > > Martin
> >
> > So as H.J. correctly pointed I can't add new symbol_type into 
> > ld_plugin_symbol struct.
> > It would make ABI change as correctly identified by abidiff:
> >
> > abidiff /tmp/before.o /tmp/after.o
> > Functions changes summary: 0 Removed, 1 Changed, 0 Added function
> > Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
> > Function symbols changes summary: 0 Removed, 0 Added function symbol not 
> > referenced by debug info
> > Variable symbols changes summary: 0 Removed, 1 Added variable symbol not 
> > referenced by debug info
> >
> > 1 function with some indirect sub-type change:
> >
> >[C]'function ld_plugin_status onload(ld_plugin_tv*)' at 
> > lto-plugin.c:1275:1 has some indirect sub-type changes:
> >  parameter 1 of type 'ld_plugin_tv*' has sub-type changes:
> >in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1:
> >  type size hasn't changed
> >  1 data member changes (1 filtered):
> >   type of 'union {int tv_val; const char* tv_string; 
> > ld_plugin_register_claim_file tv_register_claim_file; 
> > ld_plugin_register_all_symbols_read tv_register_all_symbols_read; 
> > ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols 
> > tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; 
> > ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv_message; 
> > ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view tv_get_view; 
> > ld_plugin_release_input_file tv_release_input_file; 
> > ld_plugin_add_input_library tv_add_input_library; 
> > ld_plugin_set_extra_library_path tv_set_extra_library_path; 
> > ld_plugin_get_input_section_count tv_get_input_section_count; 
> > ld_plugin_get_input_section_type tv_get_input_section_type; 
> > ld_plugin_get_input_section_name tv_get_input_section_name; 
> > ld_plugin_get_input_section_contents tv_get_input_section_contents; 
> > ld_plugin_update_section_order tv_update_section_order; 
> > ld_plugin_allow_section_ordering tv_allow_section_ordering; 
> > ld_plugin_allow_unique_segment_for_sections 
> > tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections 
> > tv_unique_segment_for_sections; ld_plugin_get_input_section_alignment 
> > tv_get_input_section_alignment; ld_plugin_get_input_section_size 
> > tv_get_input_section_size; ld_plugin_register_new_input 
> > tv_register_new_input; ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} 
> > ld_plugin_tv::tv_u' changed:
> > type size hasn't changed
> > 1 data member changes (1 filtered):
> >  type of 'ld_plugin_add_symbols tv_add_symbols' changed:
> >underlying type 'enum ld_plugin_status (void*, int, const 
> > ld_plugin_symbol*)*' changed:
> >  in pointed to type 'function type enum ld_plugin_status 
> > (void*, int, const ld_plugin_symbol*)':
> >parameter 3 of type 'const ld_plugin_symbol*' has 
> > sub-type changes:
> >  in pointed to type 'const ld_plugin_symbol':
> >in unqualified underlying type 'struct 
> > ld_plugin_symbol' at plugin-api.h:86:1:
> >  type size changed from 256 to 288 (in bits)
> >  1 data member insertion:
> >'int ld_plugin_symbol::symbol_type', at offset 
> > 256 (in bits) at plugin-api.h:95:1
> >
> > So that I need to come up with ld_plugin_symbol_v2. It brings more 
> > challenges: one has 2 parallel symbol
> > tables:
> >
> > struct plugin_symtab
> > {
> > ...
> >struct ld_plugin_symbol_v2 *syms;
> >struct ld_plugin_symbol *syms_v1;
> > ...
> > };
> >
> > and the information of these should by aligned.
> >
> > The patch can survive lto.exp and I would like to ask H.J. to write 
> > bintuils counterpart that will
> > utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2.
> >
> > Thoughts?
>
> Can't we simply have _V4/V2 use the upper half of
> ld_plugin_symbol::def?  If the linker
> then requests _V4 but the plugin cannot cope it could still "use" the
> data but get
> LDST_UNKNOWN (zero) there.
>
> IMHO LDST_VARIABLE_BSS is "misplaced"?  "BSS" is the section of the variable.
> If we want to encode more of ELF it should be LDST_OBJECT and LDST_FUNC.
> Note there's also rodata vs data info that would be missing in case
> we'd want tools
> like readelf -s dump the symbol table of the IL part of an object.  It
> looks like
> nm can also 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-11 Thread Richard Biener via Gcc-patches
On Wed, Mar 11, 2020 at 10:19 AM Martin Liška  wrote:
>
> On 3/10/20 1:07 PM, Martin Liška wrote:
> > On 3/10/20 12:24 PM, Richard Biener wrote:
> >> Not sure how symtab is encoded right now but we also could have
> >
> > Ok, right now I don't see symtab entry much extensible.
> >
> > But what am I suggesting is to parse LTO bytecode version and then
> > process conditional parsing of lto_symtab section.
> >
> > Thoughts?
> > Martin
>
> So as H.J. correctly pointed I can't add new symbol_type into 
> ld_plugin_symbol struct.
> It would make ABI change as correctly identified by abidiff:
>
> abidiff /tmp/before.o /tmp/after.o
> Functions changes summary: 0 Removed, 1 Changed, 0 Added function
> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
> Function symbols changes summary: 0 Removed, 0 Added function symbol not 
> referenced by debug info
> Variable symbols changes summary: 0 Removed, 1 Added variable symbol not 
> referenced by debug info
>
> 1 function with some indirect sub-type change:
>
>[C]'function ld_plugin_status onload(ld_plugin_tv*)' at 
> lto-plugin.c:1275:1 has some indirect sub-type changes:
>  parameter 1 of type 'ld_plugin_tv*' has sub-type changes:
>in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1:
>  type size hasn't changed
>  1 data member changes (1 filtered):
>   type of 'union {int tv_val; const char* tv_string; 
> ld_plugin_register_claim_file tv_register_claim_file; 
> ld_plugin_register_all_symbols_read tv_register_all_symbols_read; 
> ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols 
> tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; 
> ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv_message; 
> ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view tv_get_view; 
> ld_plugin_release_input_file tv_release_input_file; 
> ld_plugin_add_input_library tv_add_input_library; 
> ld_plugin_set_extra_library_path tv_set_extra_library_path; 
> ld_plugin_get_input_section_count tv_get_input_section_count; 
> ld_plugin_get_input_section_type tv_get_input_section_type; 
> ld_plugin_get_input_section_name tv_get_input_section_name; 
> ld_plugin_get_input_section_contents tv_get_input_section_contents; 
> ld_plugin_update_section_order tv_update_section_order; 
> ld_plugin_allow_section_ordering tv_allow_section_ordering; 
> ld_plugin_allow_unique_segment_for_sections 
> tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections 
> tv_unique_segment_for_sections; ld_plugin_get_input_section_alignment 
> tv_get_input_section_alignment; ld_plugin_get_input_section_size 
> tv_get_input_section_size; ld_plugin_register_new_input 
> tv_register_new_input; ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} 
> ld_plugin_tv::tv_u' changed:
> type size hasn't changed
> 1 data member changes (1 filtered):
>  type of 'ld_plugin_add_symbols tv_add_symbols' changed:
>underlying type 'enum ld_plugin_status (void*, int, const 
> ld_plugin_symbol*)*' changed:
>  in pointed to type 'function type enum ld_plugin_status 
> (void*, int, const ld_plugin_symbol*)':
>parameter 3 of type 'const ld_plugin_symbol*' has sub-type 
> changes:
>  in pointed to type 'const ld_plugin_symbol':
>in unqualified underlying type 'struct 
> ld_plugin_symbol' at plugin-api.h:86:1:
>  type size changed from 256 to 288 (in bits)
>  1 data member insertion:
>'int ld_plugin_symbol::symbol_type', at offset 256 
> (in bits) at plugin-api.h:95:1
>
> So that I need to come up with ld_plugin_symbol_v2. It brings more 
> challenges: one has 2 parallel symbol
> tables:
>
> struct plugin_symtab
> {
> ...
>struct ld_plugin_symbol_v2 *syms;
>struct ld_plugin_symbol *syms_v1;
> ...
> };
>
> and the information of these should by aligned.
>
> The patch can survive lto.exp and I would like to ask H.J. to write bintuils 
> counterpart that will
> utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2.
>
> Thoughts?

Can't we simply have _V4/V2 use the upper half of
ld_plugin_symbol::def?  If the linker
then requests _V4 but the plugin cannot cope it could still "use" the
data but get
LDST_UNKNOWN (zero) there.

IMHO LDST_VARIABLE_BSS is "misplaced"?  "BSS" is the section of the variable.
If we want to encode more of ELF it should be LDST_OBJECT and LDST_FUNC.
Note there's also rodata vs data info that would be missing in case
we'd want tools
like readelf -s dump the symbol table of the IL part of an object.  It
looks like
nm can also distinguish rodata from data ("R", "r" vs "d") and "small object"
data sections (not sure what's that about).  It seems nm cannot distinguish
symbols in mergeable string sections (it dumps "R" for me there).  So intead
of mangling everything into enum 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-11 Thread Martin Liška

On 3/10/20 1:07 PM, Martin Liška wrote:

On 3/10/20 12:24 PM, Richard Biener wrote:

Not sure how symtab is encoded right now but we also could have


Ok, right now I don't see symtab entry much extensible.

But what am I suggesting is to parse LTO bytecode version and then
process conditional parsing of lto_symtab section.

Thoughts?
Martin


So as H.J. correctly pointed I can't add new symbol_type into ld_plugin_symbol 
struct.
It would make ABI change as correctly identified by abidiff:

abidiff /tmp/before.o /tmp/after.o
Functions changes summary: 0 Removed, 1 Changed, 0 Added function
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
Function symbols changes summary: 0 Removed, 0 Added function symbol not 
referenced by debug info
Variable symbols changes summary: 0 Removed, 1 Added variable symbol not 
referenced by debug info

1 function with some indirect sub-type change:

  [C]'function ld_plugin_status onload(ld_plugin_tv*)' at lto-plugin.c:1275:1 
has some indirect sub-type changes:
parameter 1 of type 'ld_plugin_tv*' has sub-type changes:
  in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1:
type size hasn't changed
1 data member changes (1 filtered):
 type of 'union {int tv_val; const char* tv_string; 
ld_plugin_register_claim_file tv_register_claim_file; 
ld_plugin_register_all_symbols_read tv_register_all_symbols_read; 
ld_plugin_register_cleanup tv_register_cleanup; ld_plugin_add_symbols 
tv_add_symbols; ld_plugin_get_symbols tv_get_symbols; ld_plugin_add_input_file 
tv_add_input_file; ld_plugin_message tv_message; ld_plugin_get_input_file 
tv_get_input_file; ld_plugin_get_view tv_get_view; ld_plugin_release_input_file 
tv_release_input_file; ld_plugin_add_input_library tv_add_input_library; 
ld_plugin_set_extra_library_path tv_set_extra_library_path; 
ld_plugin_get_input_section_count tv_get_input_section_count; 
ld_plugin_get_input_section_type tv_get_input_section_type; 
ld_plugin_get_input_section_name tv_get_input_section_name; 
ld_plugin_get_input_section_contents tv_get_input_section_contents; 
ld_plugin_update_section_order tv_update_section_order; 
ld_plugin_allow_section_ordering tv_allow_section_ordering; 
ld_plugin_allow_unique_segment_for_sections 
tv_allow_unique_segment_for_sections; ld_plugin_unique_segment_for_sections 
tv_unique_segment_for_sections; ld_plugin_get_input_section_alignment 
tv_get_input_section_alignment; ld_plugin_get_input_section_size 
tv_get_input_section_size; ld_plugin_register_new_input tv_register_new_input; 
ld_plugin_get_wrap_symbols tv_get_wrap_symbols;} ld_plugin_tv::tv_u' changed:
   type size hasn't changed
   1 data member changes (1 filtered):
type of 'ld_plugin_add_symbols tv_add_symbols' changed:
  underlying type 'enum ld_plugin_status (void*, int, const 
ld_plugin_symbol*)*' changed:
in pointed to type 'function type enum ld_plugin_status (void*, 
int, const ld_plugin_symbol*)':
  parameter 3 of type 'const ld_plugin_symbol*' has sub-type 
changes:
in pointed to type 'const ld_plugin_symbol':
  in unqualified underlying type 'struct ld_plugin_symbol' 
at plugin-api.h:86:1:
type size changed from 256 to 288 (in bits)
1 data member insertion:
  'int ld_plugin_symbol::symbol_type', at offset 256 
(in bits) at plugin-api.h:95:1

So that I need to come up with ld_plugin_symbol_v2. It brings more challenges: 
one has 2 parallel symbol
tables:

struct plugin_symtab
{
...
  struct ld_plugin_symbol_v2 *syms;
  struct ld_plugin_symbol *syms_v1;
...
};

and the information of these should by aligned.

The patch can survive lto.exp and I would like to ask H.J. to write bintuils 
counterpart that will
utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2.

Thoughts?
Martin


>From 1271e9cc487847c1aa07e5bc0470e8c5b71900e9 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-11  Martin Liska  

	* lto-streamer-out.c (write_symbol): Stream type
	of symbol (function, variable - bss/data).

include/ChangeLog:

2020-03-11  Martin Liska  

	* lto-symtab.h (enum gcc_plugin_symbol_type):
	New.
	* plugin-api.h (struct ld_plugin_symbol_v2): New
	ld_plugin_symbol structure with added symbol_type.
	(enum ld_plugin_symbol_type): New.
	(ld_plugin_add_symbols_v2): New.
	(ld_plugin_get_symbols_v2): New.
	(ld_plugin_tag): Add LDPT_GET_SYMBOLS_V4
	and LDPT_ADD_SYMBOLS_V2.
	(struct ld_plugin_tv): Add tv_add_symbols_v2
	and tv_get_symbols_v4.

lto-plugin/ChangeLog:

2020-03-11  Martin Liska  

	* lto-plugin.c (LTO_LTO_PREFIX): New.
	(LTO_LTO_PREFIX_LEN): Likewise.
	(struct plugin_symtab): Make syms of new
	type ld_plugin_symbol_v2 and add syms_v1
	for older API hooks.
	(struct lto_section): New.
	

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-10 Thread Michael Matz
Hello,

On Tue, 10 Mar 2020, Martin Liška wrote:

> >>> We nee to support different variables, like TLS, data and bss variables.
> >>
> >> Why do we need TLS? Right now, it's not supported by nm.
> > 
> > Of course it does.  It's the 'T' (or 't') character.
> 
> Thank you reply!
> 
> Are you sure about it?

Err, I bungled in between writing emails, you are right, nm(1) in BSD mode 
doesn't make a difference for TLS symbols.  (And of course T/t are for 
text, aka code, symbols).

Doesn't invalidate the rest of my email, though.


Ciao,
Michael.


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-10 Thread Martin Liška

On 3/10/20 10:39 AM, Martin Liška wrote:


Are you sure about it?
$ gcc gcc/testsuite/gcc.target/i386/pr56564-3.c -c -fpic && nm pr56564-3.o
...
 D s
0010 D t

?


A nicer example:

$ cat tls.c
__thread struct S { long a, b; } s = { 0, 0 };
__thread char t[16] = { 7 };

$ gcc tls.c -c && nm -B tls.o
 B s
 D t

Well, TLS seems to me an orthogonal problem. Similarly other special symbols 
like
.func, .symver, alias attribute, ...

Martin


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-10 Thread Martin Liška

On 3/10/20 12:24 PM, Richard Biener wrote:

Not sure how symtab is encoded right now but we also could have


Ok, right now I don't see symtab entry much extensible.

But what am I suggesting is to parse LTO bytecode version and then
process conditional parsing of lto_symtab section.

Thoughts?
Martin
>From 017dd9cba9a0222104682bef094d9f5057d2c9ae Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-09  Martin Liska  

	* lto-streamer-out.c (write_symbol): Stream
	symbol type.

include/ChangeLog:

2020-03-09  Martin Liska  

	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
	* plugin-api.h (struct ld_plugin_symbol): New member
	symbols_type.
	(enum ld_plugin_symbol_type): New.
	(enum ld_plugin_tag): Add new tag LDPT_GET_SYMBOLS_V4.

lto-plugin/ChangeLog:

2020-03-09  Martin Liska  

	* lto-plugin.c (parse_table_entry): Parse symbol type.
	(LTO_LTO_PREFIX): New.
	(LTO_LTO_PREFIX_LEN): New.
	(struct lto_section): New.
---
 gcc/lto-streamer-out.c  | 14 
 include/lto-symtab.h|  8 +++
 include/plugin-api.h| 14 +++-
 lto-plugin/lto-plugin.c | 48 -
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..ead606eb665 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2773,6 +2774,19 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (, 1);
   c = (unsigned char) visibility;
   lto_write_data (, 1);
+
+  gcc_plugin_symbol_type st;
+  if (TREE_CODE (t) == VAR_DECL)
+{
+  section *s = get_variable_section (t, false);
+  st = (s->common.flags & SECTION_BSS
+	? GCCST_VARIABLE_BSS : GCCST_VARIABLE_DATA);
+}
+  else
+st = GCCST_FUNCTION;
+
+  c = (unsigned char) st;
+  lto_write_data (, 1);
   lto_write_data (, 8);
   lto_write_data (_num, 4);
 }
diff --git a/include/lto-symtab.h b/include/lto-symtab.h
index 0ce0de10121..901bc3585c2 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,12 @@ enum gcc_plugin_symbol_visibility
 GCCPV_HIDDEN
   };
 
+enum gcc_plugin_symbol_type
+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE_DATA,
+  GCCST_VARIABLE_BSS
+};
+
 #endif /* GCC_LTO_SYMTAB_H  */
diff --git a/include/plugin-api.h b/include/plugin-api.h
index 09e1202df07..794a2dcc4ee 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -92,6 +92,7 @@ struct ld_plugin_symbol
   uint64_t size;
   char *comdat_key;
   int resolution;
+  int symbol_type;
 };
 
 /* An object's section.  */
@@ -123,6 +124,16 @@ enum ld_plugin_symbol_visibility
   LDPV_HIDDEN
 };
 
+/* The type of the symbol.  */
+
+enum ld_plugin_symbol_type
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE_DATA,
+  LDST_VARIABLE_BSS
+};
+
 /* How a symbol is resolved.  */
 
 enum ld_plugin_symbol_resolution
@@ -431,7 +442,8 @@ enum ld_plugin_tag
   LDPT_GET_INPUT_SECTION_ALIGNMENT = 29,
   LDPT_GET_INPUT_SECTION_SIZE = 30,
   LDPT_REGISTER_NEW_INPUT_HOOK = 31,
-  LDPT_GET_WRAP_SYMBOLS = 32
+  LDPT_GET_WRAP_SYMBOLS = 32,
+  LDPT_GET_SYMBOLS_V4 = 33,
 };
 
 /* The plugin transfer vector.  */
diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index c307fc871bf..33afae9afb6 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -90,6 +90,8 @@ along with this program; see the file COPYING3.  If not see
 
 #define LTO_SECTION_PREFIX	".gnu.lto_.symtab"
 #define LTO_SECTION_PREFIX_LEN	(sizeof (LTO_SECTION_PREFIX) - 1)
+#define LTO_LTO_PREFIX		".gnu.lto_.lto"
+#define LTO_LTO_PREFIX_LEN	(sizeof (LTO_LTO_PREFIX) - 1)
 #define OFFLOAD_SECTION		".gnu.offload_lto_.opts"
 #define OFFLOAD_SECTION_LEN	(sizeof (OFFLOAD_SECTION) - 1)
 
@@ -221,6 +223,20 @@ check_1 (int gate, enum ld_plugin_level level, const char *text)
 }
 }
 
+/* Structure that represents LTO ELF section with information
+   about the format.  */
+
+struct lto_section
+ {
+   int16_t major_version;
+   int16_t minor_version;
+   unsigned char slim_object: 1;
+   unsigned char compression: 4;
+   int32_t reserved0: 27;
+};
+
+struct lto_section lto_header;
+
 /* This little wrapper allows check to be called with a non-integer
first argument, such as a pointer that must be non-NULL.  We can't
use c99 bool type to coerce it into range, so we explicitly test.  */
@@ -252,6 +268,14 @@ parse_table_entry (char *p, struct ld_plugin_symbol *entry,
   LDPV_HIDDEN
 };
 
+  enum ld_plugin_symbol_type symbol_types[] =
+{
+  LDST_UNKNOWN,
+  LDST_FUNCTION,
+  LDST_VARIABLE_DATA,
+  LDST_VARIABLE_BSS
+};
+
   switch (sym_style)
  

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-10 Thread Richard Biener
On Tue, Mar 10, 2020 at 12:09 PM Jan Hubicka  wrote:
>
> > > >>> @Honza/Richi: Do you have any opinion about that?
> > > >
> > > > I guess we indeed want to get as close to non-LTO nm behaviour as
> > > > possible. So we want to support them and perhaps think of .symtab
> > > > section file format that can be made backward compatible (such as having
> > > > attribute string for symbols where we can add new info in future in a
> > > > way that old plugins will still get info they want).
> > >
> > > I like the idea. But it's probably next stage1 material. Or can you 
> > > prepare
> > > a patch?
> >
> > I think what's important is that the LTO plugin needs to understand
> > the old and the new version since there's only one for auto-loading.
> >
> > The other missing feature of the linker plugin API is file claiming
> > which should be a on a section basis instead - but that's a different
> > part of the API and not related to symbol tables.  Enhancing that
> > part of the API would allow to elide the LTO debug copying ...
>
> Thinking of it, it seems to me that we do not need to break
> compatibility with existing plugins.  We could keep existing .symtab
> section the way it is implemented right now
> and add additional data to new .symtab_ext section so existing plugins
> will work as expeted.
>
> We could tag symtab_ext by a version string and keep adding extensions
> of extensions in the future being compatible with old plugins.

Not sure how symtab is encoded right now but we also could have

entry1: symbol 
entry2: ^ext-ver-n 
entry3: ^ext-ver-m 
entry4: symbol 

and ^ext-ver-X being escape byte(s) denoting we're providing
additional data for the preceeding symbol.  Not sure if there's
something conveniently available in the current encoding that
would make older plugins skip an entry ;)

> Indeed that would help situation user already has distro provided plugin
> in the search path but compiles its own gcc10.
>
> Honza


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-10 Thread Jan Hubicka
> > >>> @Honza/Richi: Do you have any opinion about that?
> > >
> > > I guess we indeed want to get as close to non-LTO nm behaviour as
> > > possible. So we want to support them and perhaps think of .symtab
> > > section file format that can be made backward compatible (such as having
> > > attribute string for symbols where we can add new info in future in a
> > > way that old plugins will still get info they want).
> >
> > I like the idea. But it's probably next stage1 material. Or can you prepare
> > a patch?
> 
> I think what's important is that the LTO plugin needs to understand
> the old and the new version since there's only one for auto-loading.
> 
> The other missing feature of the linker plugin API is file claiming
> which should be a on a section basis instead - but that's a different
> part of the API and not related to symbol tables.  Enhancing that
> part of the API would allow to elide the LTO debug copying ...

Thinking of it, it seems to me that we do not need to break
compatibility with existing plugins.  We could keep existing .symtab
section the way it is implemented right now
and add additional data to new .symtab_ext section so existing plugins
will work as expeted.

We could tag symtab_ext by a version string and keep adding extensions
of extensions in the future being compatible with old plugins.

Indeed that would help situation user already has distro provided plugin
in the search path but compiles its own gcc10.

Honza


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-10 Thread Richard Biener
On Tue, Mar 10, 2020 at 11:05 AM Martin Liška  wrote:
>
> On 3/9/20 9:19 PM, Jan Hubicka wrote:
> >> On Mon, Mar 9, 2020 at 9:56 AM Martin Liška  wrote:
> >>>
> >>> On 3/9/20 4:36 PM, H.J. Lu wrote:
>  We nee to support different variables, like TLS, data and bss variables.
> >>>
> >>> Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?
> >>
> >> Since you are introducing symbol types, why not support TLS?
> >>
> >>> About BSS and DATA I agree that it would be handy. I can theoretically
> >>> covered with code in get_variable_section/bss_initializer_p. But it's
> >>> quite logic and I'm not sure we should simulate it.
> >
> > I think it should not be that hard to factor out the logic from
> > get_variable_section to return enum of what we want to do and then
> > have get_variale_section as a wrapper parsing this enum to actual
> > section.
>
> So it was easier that I expected and I'm sending updated version
> of the patch.
>
> >>>
> >>> @Honza/Richi: Do you have any opinion about that?
> >
> > I guess we indeed want to get as close to non-LTO nm behaviour as
> > possible. So we want to support them and perhaps think of .symtab
> > section file format that can be made backward compatible (such as having
> > attribute string for symbols where we can add new info in future in a
> > way that old plugins will still get info they want).
>
> I like the idea. But it's probably next stage1 material. Or can you prepare
> a patch?

I think what's important is that the LTO plugin needs to understand
the old and the new version since there's only one for auto-loading.

The other missing feature of the linker plugin API is file claiming
which should be a on a section basis instead - but that's a different
part of the API and not related to symbol tables.  Enhancing that
part of the API would allow to elide the LTO debug copying ...

> >
> > Of course IPA optimizations may migrate symbols around (say from data to
> > bss)/take them away/rename them, but with that we need to live. I would
> > expect most tools inspecting nm are interested in what will enter
> > linking not what will be in final output.
>
> Yes, there are mostly used during configure script run where they commonly
> do not take final linked executables/shared libs.
>
> >
> > Since we discuss plugin extensions (and I do not want this to complicate
> > finishing Martin's patch).
>
> Please suggest that in another patch.
>
> The current situation with binutils is bad because we can't build distribution
> with -fno-common and LTO.
>
> Martin
>
> >  Are we aware of other plugin limitations?
> > One thing that I consider unsafe is the way we produce local names when
> > we need to promote symbol to hidden due to partitining.  We add
> > .lto_priv, but that is not safe if we link with .o file that was
> > incrementally lto-optimized to target object file (this is reason why I
> > did not enabled WHOPR path for it).
> >
> > We may also want to inform lld and llvm's gold plugin maintainers about
> > intended changes.
> > Honza
> >>>
> >>> Thanks,
> >>> Martin
> >>
> >>
> >>
> >> --
> >> H.J.
>


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-10 Thread Martin Liška

On 3/9/20 9:19 PM, Jan Hubicka wrote:

On Mon, Mar 9, 2020 at 9:56 AM Martin Liška  wrote:


On 3/9/20 4:36 PM, H.J. Lu wrote:

We nee to support different variables, like TLS, data and bss variables.


Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?


Since you are introducing symbol types, why not support TLS?


About BSS and DATA I agree that it would be handy. I can theoretically
covered with code in get_variable_section/bss_initializer_p. But it's
quite logic and I'm not sure we should simulate it.


I think it should not be that hard to factor out the logic from
get_variable_section to return enum of what we want to do and then
have get_variale_section as a wrapper parsing this enum to actual
section.


So it was easier that I expected and I'm sending updated version
of the patch.



@Honza/Richi: Do you have any opinion about that?


I guess we indeed want to get as close to non-LTO nm behaviour as
possible. So we want to support them and perhaps think of .symtab
section file format that can be made backward compatible (such as having
attribute string for symbols where we can add new info in future in a
way that old plugins will still get info they want).


I like the idea. But it's probably next stage1 material. Or can you prepare
a patch?



Of course IPA optimizations may migrate symbols around (say from data to
bss)/take them away/rename them, but with that we need to live. I would
expect most tools inspecting nm are interested in what will enter
linking not what will be in final output.


Yes, there are mostly used during configure script run where they commonly
do not take final linked executables/shared libs.



Since we discuss plugin extensions (and I do not want this to complicate
finishing Martin's patch).


Please suggest that in another patch.

The current situation with binutils is bad because we can't build distribution
with -fno-common and LTO.

Martin


 Are we aware of other plugin limitations?
One thing that I consider unsafe is the way we produce local names when
we need to promote symbol to hidden due to partitining.  We add
.lto_priv, but that is not safe if we link with .o file that was
incrementally lto-optimized to target object file (this is reason why I
did not enabled WHOPR path for it).

We may also want to inform lld and llvm's gold plugin maintainers about
intended changes.
Honza


Thanks,
Martin




--
H.J.


>From 4214743fc011fd8900a89166759a5511d8da5da2 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 6 Mar 2020 18:09:35 +0100
Subject: [PATCH] API extension for binutils (type of symbols).

gcc/ChangeLog:

2020-03-09  Martin Liska  

	* lto-streamer-out.c (write_symbol): Stream
	symbol type.

include/ChangeLog:

2020-03-09  Martin Liska  

	* lto-symtab.h (enum gcc_plugin_symbol_type): New.
	* plugin-api.h (struct ld_plugin_symbol): New member
	symbols_type.
	(enum ld_plugin_symbol_type): New.
	(enum ld_plugin_tag): Add new tag LDPT_GET_SYMBOLS_V4.

lto-plugin/ChangeLog:

2020-03-09  Martin Liska  

	* lto-plugin.c (parse_table_entry): Parse symbol type.
---
 gcc/lto-streamer-out.c  | 14 ++
 include/lto-symtab.h|  8 
 include/plugin-api.h| 14 +-
 lto-plugin/lto-plugin.c | 13 +
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index cea5e71cffb..ead606eb665 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "print-tree.h"
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
+#include "output.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2773,6 +2774,19 @@ write_symbol (struct streamer_tree_cache_d *cache,
   lto_write_data (, 1);
   c = (unsigned char) visibility;
   lto_write_data (, 1);
+
+  gcc_plugin_symbol_type st;
+  if (TREE_CODE (t) == VAR_DECL)
+{
+  section *s = get_variable_section (t, false);
+  st = (s->common.flags & SECTION_BSS
+	? GCCST_VARIABLE_BSS : GCCST_VARIABLE_DATA);
+}
+  else
+st = GCCST_FUNCTION;
+
+  c = (unsigned char) st;
+  lto_write_data (, 1);
   lto_write_data (, 8);
   lto_write_data (_num, 4);
 }
diff --git a/include/lto-symtab.h b/include/lto-symtab.h
index 0ce0de10121..901bc3585c2 100644
--- a/include/lto-symtab.h
+++ b/include/lto-symtab.h
@@ -38,4 +38,12 @@ enum gcc_plugin_symbol_visibility
 GCCPV_HIDDEN
   };
 
+enum gcc_plugin_symbol_type
+{
+  GCCST_UNKNOWN,
+  GCCST_FUNCTION,
+  GCCST_VARIABLE_DATA,
+  GCCST_VARIABLE_BSS
+};
+
 #endif /* GCC_LTO_SYMTAB_H  */
diff --git a/include/plugin-api.h b/include/plugin-api.h
index 09e1202df07..794a2dcc4ee 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -92,6 +92,7 @@ struct ld_plugin_symbol
   uint64_t size;
   char *comdat_key;
   int resolution;
+  int symbol_type;
 };
 
 /* An object's section.  */
@@ -123,6 +124,16 @@ enum 

Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-10 Thread Martin Liška

On 3/9/20 8:45 PM, Michael Matz wrote:

Hello,

On Mon, 9 Mar 2020, Martin Liška wrote:


On 3/9/20 4:36 PM, H.J. Lu wrote:

We nee to support different variables, like TLS, data and bss variables.


Why do we need TLS? Right now, it's not supported by nm.


Of course it does.  It's the 'T' (or 't') character.


Thank you reply!

Are you sure about it?
$ gcc gcc/testsuite/gcc.target/i386/pr56564-3.c -c -fpic && nm pr56564-3.o
...
 D s
0010 D t

?


 When you introduce
symbol categories into the plugin system it would be advisable to include
all we usually care about, and as the ELF categories are (roughly) a
superset of everything we support, I'd say that should be the list to look
at.  I.e. a mixture of visibility, locality (aka binding) and type:


I agree with that.



{object,function,common,tls}


Here LDPK_COMMON is already handled.


  x {local,global,weak,unique}
  x {default,internal,hidden,protected}

That doesn't include symbols types section,file,ifunc or os or arch
specific types or visibilities or bindings.  But it would probably not be
the worst idea to simply encode what we need with ELF constants and names.
While not all the world is ELF, all concepts we have can be mapped onto
ELF.


Ciao,
Michael.





Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-09 Thread Jan Hubicka
> On Mon, Mar 9, 2020 at 9:56 AM Martin Liška  wrote:
> >
> > On 3/9/20 4:36 PM, H.J. Lu wrote:
> > > We nee to support different variables, like TLS, data and bss variables.
> >
> > Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?
> 
> Since you are introducing symbol types, why not support TLS?
> 
> > About BSS and DATA I agree that it would be handy. I can theoretically
> > covered with code in get_variable_section/bss_initializer_p. But it's
> > quite logic and I'm not sure we should simulate it.

I think it should not be that hard to factor out the logic from
get_variable_section to return enum of what we want to do and then
have get_variale_section as a wrapper parsing this enum to actual
section.
> >
> > @Honza/Richi: Do you have any opinion about that?

I guess we indeed want to get as close to non-LTO nm behaviour as
possible. So we want to support them and perhaps think of .symtab
section file format that can be made backward compatible (such as having
attribute string for symbols where we can add new info in future in a
way that old plugins will still get info they want).

Of course IPA optimizations may migrate symbols around (say from data to
bss)/take them away/rename them, but with that we need to live. I would
expect most tools inspecting nm are interested in what will enter
linking not what will be in final output.

Since we discuss plugin extensions (and I do not want this to complicate
finishing Martin's patch).  Are we aware of other plugin limitations?
One thing that I consider unsafe is the way we produce local names when
we need to promote symbol to hidden due to partitining.  We add
.lto_priv, but that is not safe if we link with .o file that was
incrementally lto-optimized to target object file (this is reason why I
did not enabled WHOPR path for it). 

We may also want to inform lld and llvm's gold plugin maintainers about
intended changes.
Honza
> >
> > Thanks,
> > Martin
> 
> 
> 
> -- 
> H.J.


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-09 Thread Michael Matz
Hello,

On Mon, 9 Mar 2020, Martin Liška wrote:

> On 3/9/20 4:36 PM, H.J. Lu wrote:
> > We nee to support different variables, like TLS, data and bss variables.
> 
> Why do we need TLS? Right now, it's not supported by nm.

Of course it does.  It's the 'T' (or 't') character.  When you introduce 
symbol categories into the plugin system it would be advisable to include 
all we usually care about, and as the ELF categories are (roughly) a 
superset of everything we support, I'd say that should be the list to look 
at.  I.e. a mixture of visibility, locality (aka binding) and type:

   {object,function,common,tls}
 x {local,global,weak,unique}
 x {default,internal,hidden,protected}

That doesn't include symbols types section,file,ifunc or os or arch 
specific types or visibilities or bindings.  But it would probably not be 
the worst idea to simply encode what we need with ELF constants and names.  
While not all the world is ELF, all concepts we have can be mapped onto 
ELF.


Ciao,
Michael.


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-09 Thread H.J. Lu
On Mon, Mar 9, 2020 at 9:56 AM Martin Liška  wrote:
>
> On 3/9/20 4:36 PM, H.J. Lu wrote:
> > We nee to support different variables, like TLS, data and bss variables.
>
> Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?

Since you are introducing symbol types, why not support TLS?

> About BSS and DATA I agree that it would be handy. I can theoretically
> covered with code in get_variable_section/bss_initializer_p. But it's
> quite logic and I'm not sure we should simulate it.
>
> @Honza/Richi: Do you have any opinion about that?
>
> Thanks,
> Martin



-- 
H.J.


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-09 Thread Martin Liška

On 3/9/20 4:36 PM, H.J. Lu wrote:

We nee to support different variables, like TLS, data and bss variables.


Why do we need TLS? Right now, it's not supported by nm. Or am I wrong?

About BSS and DATA I agree that it would be handy. I can theoretically
covered with code in get_variable_section/bss_initializer_p. But it's
quite logic and I'm not sure we should simulate it.

@Honza/Richi: Do you have any opinion about that?

Thanks,
Martin


Re: [PATCH][RFC] API extension for binutils (type of symbols).

2020-03-09 Thread H.J. Lu
On Mon, Mar 9, 2020 at 2:30 AM Martin Liška  wrote:
>
> Hi.
>
> With change of -fno-common default for GCC 10 we're facing serious
> problem with LTO where one can't distinguish in between global symbols
> and variables based on the output of nm:
> https://sourceware.org/bugzilla/show_bug.cgi?id=25355
>
> $ cat nm.c
> char nm_test_var;
>
> $ gcc-9 nm.c -c -fno-common
> $ nm nm.o
>  B nm_test_var
>
> $ gcc-9 nm.c -c -fno-common -flto
> $ nm nm.o
>  T nm_test_var
>
> H.J. decided to implement quite heavy solution which is about usage of 
> lto-wrapper
> that takes a LTO object file and makes a final assembly file. The file is then
> utilized with nm. That has some disadvantages:
>
> - it's slow - using nm x.a can take very long time
> - way of finding lto-wrapper is quite hard-coded to location of LTO plugin
> - we face issues with multiple final object files:
>https://sourceware.org/bugzilla/show_bug.cgi?id=25640
>
> That said, I'm suggesting to expect LTO plugin API to tell binutils whether
> a symbol is variable or function. That should help us to mark global variables
> with "D" in nm output.
>
> I would like to note that even with -fcommon, the nm output for LTO bytecode 
> is far
> from perfect:
>
> $ cat bss.c
> int global_zero;
> int global_one = 1;
>
> $ gcc-9 bss.c -c  -flto
> $ nm bss.o
>  T global_one
>  C global_zero
>
> I believe in this case we can mark both symbols with D as a reasonable guess.
>
> Thoughts?
> Martin
>
> gcc/ChangeLog:
>
> 2020-03-09  Martin Liska  
>
> * lto-streamer-out.c (write_symbol): Stream
> symbol type.
>
> include/ChangeLog:
>
> 2020-03-09  Martin Liska  
>
> * lto-symtab.h (enum gcc_plugin_symbol_type): New.
> * plugin-api.h (struct ld_plugin_symbol): New member
> symbols_type.
> (enum ld_plugin_symbol_type): New.
> (enum ld_plugin_tag): Add new tag LDPT_GET_SYMBOLS_V4.
>
> lto-plugin/ChangeLog:
>
> 2020-03-09  Martin Liska  
>
> * lto-plugin.c (parse_table_entry): Parse symbol type.
> ---
>   gcc/lto-streamer-out.c  |  2 ++
>   include/lto-symtab.h|  7 +++
>   include/plugin-api.h| 13 -
>   lto-plugin/lto-plugin.c | 12 
>   4 files changed, 33 insertions(+), 1 deletion(-)
>
>

We nee to support different variables, like TLS, data and bss variables.

-- 
H.J.