[Lldb-commits] [PATCH] D145180: [lldb] Introduce new SymbolFileJSON and ObjectFileJSON

Greg Clayton via Phabricator via lldb-commits Thu, 02 Mar 2023 21:11:52 -0800

clayborg added a comment.

I like this idea of this, but I would like to see this be a bit more complete. 
One idea is to remove the ObjectFileJSON::Symbol definition and just use 
lldb_private::Symbol objects and allow these objects to construct encode and 
decode from JSON. Then we have the ability to re-create any symbols we need in 
full fidelity. But that not be the goal in your case, but I don't think it 
would hurt to use the lldb_private::Symbol as the object we serialize and 
deserialize from JSON as ObjectFileJSON::Symbol just can't reproduce the full 
depth of the symbols we want.


From reading this it looks like your main use case is to supply additional 
symbols to an existing stripped binary. Is that correct? Are you aware that 
each dSYM file has a full copy of the unstripped symbol table already? It 
removes the debug map entries, but each dSYM copies all non debug map symbols 
inside of it already. So the same thing from this patch can already be 
accomplished by stripping a dSYM file of all debug info sections and leaving 
the symbol table alone, and then using this this minimal dSYM file just to get 
the symbols.

Any idea on where the JSON file will live if it is just a companion file for 
another mach-o or ELF executable? Will it always be next to the mach-o 
executable? Will we enable a Spotlight importer to find it like we do for dSYM 
files?



================
Comment at: lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.cpp:133
+void ObjectFileJSON::ParseSymtab(Symtab &symtab) {
+  // Nothing to do for JSON files, all information is parsed as debug info.
+}
----------------
Don't we want to create a Symtab from the ObjectFileJSON::Symbol vector here? 
Symbols are not considered debug info. Debug info for functions is supposed to 
be more rich than just address and name.


================
Comment at: lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h:92-95
+  struct Header {
+    std::string triple;
+    std::string uuid;
+  };
----------------
It would be great to be able to also define sections somewhere. I can think of 
this being really handy for core file serialization where we generate a core 
file and then store extra metadata that describes the object file in this JSON 
format. It would be great to have sections for this, and we really need 
sections to make ObjectFileJSON be able to represent something that actually 
looks like a real object file. See Section.h for  what we expect  section to 
have, but we can just store uint64_t values instead of lldb_private::Address 
values for the start address of the range for the section.


================
Comment at: lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h:97-100
+  struct Symbol {
+    uint64_t addr;
+    std::string name;
+  };
----------------
A few things to think about:
- some symbols have a byte size (like in ELF or mach-o symbols in the debug map 
that have pairs)
- some symbol have absolute values that are not actually addresses, there is no 
way to represent this currently here
- some symbols have symbol types. We could allow symbols to specify a 
lldb::SymbolType by name?
- If we have symbol types other than just symbols with addresses, some might 
refer to a sibling symbol by ID or index. It might be good to allow symbols to 
have integer ids

One suggestion would be to define Symbol as:

```
 struct Symbol {
    uint64_t value;
    std::optional<uint64_t> size;
    std::optional<uint64_t> id; 
    std::optional<SymbolType> type; 
    std::string name;
    bool value_is_address; // Set to true if "value" is an address, false if 
"value" is just an integer with a different meaning
  };
```


================
Comment at: lldb/source/Plugins/SymbolFile/JSON/SymbolFileJSON.cpp:87
+
+void SymbolFileJSON::AddSymbols(Symtab &symtab) {
+  auto json_object_file = dyn_cast_or_null<ObjectFileJSON>(m_objfile_sp.get());
----------------
If we have no debug info i SymbolFileJSON, then there is really no need to add 
the symbols here, we can just avoid this whole SymbolFileJSON class and have 
ObjectFileJSON parse the JSON symbols and add them to the Symtab in:
```
void ObjectFileJSON::ParseSymtab(Symtab &symtab);
```
So if we are not going to add JSON debug info to the ObjectFileJSON schema, 
then I would vote to remove this class and just do everything from 
ObjectFileJSON.


================
Comment at: lldb/source/Plugins/SymbolFile/JSON/SymbolFileJSON.cpp:112
+    symtab.AddSymbol(Symbol(
+        /*symID*/ 0, Mangled(symbol.name), eSymbolTypeCode,
+        /*is_global*/ true, /*is_debug*/ false,
----------------
JDevlieghere wrote:
> mib wrote:
> > Should we increment the `symID` here ?
> The IDs are arbitrary, and if we start at zero, we'll have conflicts with the 
> ones already in the symbol table (e.g. the lldb_unnamed_symbols for stripped 
> binaries). We could check the size of the symtab and continue counting from 
> there. Or we can use 0 like we did here to indicate that these values are 
> "special". I went with the latter approach mostly because that's what 
> SymbolFileBreakpad does too. 
regardless of where the code goes that converts ObjectFileJSON::Symbol to 
lldb_private::Symbol, I would vote that we include enough information in 
ObjectFileJSON::Symbol so that we don't have to make up symbols IDs, or set all 
symbol IDs to zero. LLDB relies on having unique symbol IDs in each 
lldb_private::Symbol instance. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145180/new/

https://reviews.llvm.org/D145180

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D145180: [lldb] Introduce new SymbolFileJSON and ObjectFileJSON

Reply via email to