[Lldb-commits] [lldb] [llvm] [lldb] Support disassembling RISC-V proprietary instructions (PR #145793)

via lldb-commits Mon, 14 Jul 2025 08:38:52 -0700

https://github.com/tedwoodward updated 
https://github.com/llvm/llvm-project/pull/145793


>From 1a7ee4297bb8e6b3fa08818e05cf245a2c768c2b Mon Sep 17 00:00:00 2001
From: Ted Woodward <tedw...@quicinc.com>
Date: Wed, 25 Jun 2025 14:22:28 -0700
Subject: [PATCH 1/9] Support disassembling RISC-V proprietary insns

RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
---
 lldb/examples/python/filter_disasm.py         | 87 +++++++++++++++++++
 lldb/include/lldb/Core/Opcode.h               |  1 +
 lldb/source/Core/Disassembler.cpp             | 14 ++-
 lldb/source/Core/Opcode.cpp                   | 38 ++++++++
 .../Disassembler/LLVMC/DisassemblerLLVMC.cpp  | 39 +++++----
 lldb/source/Utility/ArchSpec.cpp              |  4 +-
 6 files changed, 160 insertions(+), 23 deletions(-)
 create mode 100644 lldb/examples/python/filter_disasm.py

diff --git a/lldb/examples/python/filter_disasm.py 
b/lldb/examples/python/filter_disasm.py
new file mode 100644
index 0000000000000..adb3455209055
--- /dev/null
+++ b/lldb/examples/python/filter_disasm.py
@@ -0,0 +1,87 @@
+"""
+Defines a command, fdis, that does filtered disassembly. The command does the
+lldb disassemble command with -b and any other arguments passed in, and
+pipes that through a provided filter program.
+
+The intention is to support disassembly of RISC-V proprietary instructions.
+This is handled with llvm-objdump by piping the output of llvm-objdump through
+a filter program. This script is intended to mimic that workflow.
+"""
+
+import lldb
+import subprocess
+
+filter_program = "crustfilt"
+
+def __lldb_init_module(debugger, dict):
+    debugger.HandleCommand(
+        'command script add -f filter_disasm.fdis fdis')
+    print("Disassembly filter command (fdis) loaded")
+    print("Filter program set to %s" % filter_program)
+
+
+def fdis(debugger, args, result, dict):
+    """
+  Call the built in disassembler, then pass its output to a filter program
+  to add in disassembly for hidden opcodes.
+  Except for get and set, use the fdis command like the disassemble command.
+  By default, the filter program is crustfilt, from
+  https://github.com/quic/crustfilt . This can be changed by changing
+  the global variable filter_program.
+
+  Usage:
+    fdis [[get] [set <program>] [<disassembly options>]]
+
+    Choose one of the following:
+        get
+            Gets the current filter program
+
+        set <program>
+            Sets the current filter program. This can be an executable, which
+            will be found on PATH, or an absolute path.
+
+        <disassembly options>
+            If the first argument is not get or set, the args will be passed
+            to the disassemble command as is.
+
+    """
+
+    global filter_program
+    args_list = args.split(' ')
+    result.Clear()
+
+    if len(args_list) == 1 and args_list[0] == 'get':
+        result.PutCString(filter_program)
+        result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
+        return
+
+    if len(args_list) == 2 and args_list[0] == 'set':
+        filter_program = args_list[1]
+        result.PutCString("Filter program set to %s" % filter_program)
+        result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
+        return
+
+    res = lldb.SBCommandReturnObject()
+    debugger.GetCommandInterpreter().HandleCommand('disassemble -b ' + args, 
res)
+    if (len(res.GetError()) > 0):
+        result.SetError(res.GetError())
+        result.SetStatus(lldb.eReturnStatusFailed)
+        return
+    output = res.GetOutput()
+
+    try:
+        proc = subprocess.run([filter_program], capture_output=True, 
text=True, input=output)
+    except (subprocess.SubprocessError, OSError) as e:
+        result.PutCString("Error occurred. Original disassembly:\n\n" + output)
+        result.SetError(str(e))
+        result.SetStatus(lldb.eReturnStatusFailed)
+        return
+
+    print(proc.stderr)
+    if proc.stderr:
+        pass
+        #result.SetError(proc.stderr)
+        #result.SetStatus(lldb.eReturnStatusFailed)
+    else:
+        result.PutCString(proc.stdout)
+        result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
diff --git a/lldb/include/lldb/Core/Opcode.h b/lldb/include/lldb/Core/Opcode.h
index f72f2687b54fe..88ef17093d3f3 100644
--- a/lldb/include/lldb/Core/Opcode.h
+++ b/lldb/include/lldb/Core/Opcode.h
@@ -200,6 +200,7 @@ class Opcode {
   }
 
   int Dump(Stream *s, uint32_t min_byte_width);
+  int DumpRISCV(Stream *s, uint32_t min_byte_width);
 
   const void *GetOpcodeBytes() const {
     return ((m_type == Opcode::eTypeBytes) ? m_data.inst.bytes : nullptr);
diff --git a/lldb/source/Core/Disassembler.cpp 
b/lldb/source/Core/Disassembler.cpp
index 833e327579a29..f95e446448036 100644
--- a/lldb/source/Core/Disassembler.cpp
+++ b/lldb/source/Core/Disassembler.cpp
@@ -658,8 +658,13 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t 
max_opcode_byte_size,
       // the byte dump to be able to always show 15 bytes (3 chars each) plus a
       // space
       if (max_opcode_byte_size > 0)
-        m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
-      else
+        // make RISC-V opcode dump look like llvm-objdump
+        if (exe_ctx &&
+            exe_ctx->GetTargetSP()->GetArchitecture().GetTriple().isRISCV())
+          m_opcode.DumpRISCV(&ss, max_opcode_byte_size * 3 + 1);
+        else
+          m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
+       else
         m_opcode.Dump(&ss, 15 * 3 + 1);
     } else {
       // Else, we have ARM or MIPS which can show up to a uint32_t 0x00000000
@@ -685,10 +690,13 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t 
max_opcode_byte_size,
     }
   }
   const size_t opcode_pos = ss.GetSizeOfLastLine();
-  const std::string &opcode_name =
+  std::string &opcode_name =
       show_color ? m_markup_opcode_name : m_opcode_name;
   const std::string &mnemonics = show_color ? m_markup_mnemonics : m_mnemonics;
 
+  if (opcode_name.empty())
+    opcode_name = "<unknown>";
+
   // The default opcode size of 7 characters is plenty for most architectures
   // but some like arm can pull out the occasional vqrshrun.s16.  We won't get
   // consistent column spacing in these cases, unfortunately. Also note that we
diff --git a/lldb/source/Core/Opcode.cpp b/lldb/source/Core/Opcode.cpp
index 3e30d98975d8a..dbcd18cc0d8d2 100644
--- a/lldb/source/Core/Opcode.cpp
+++ b/lldb/source/Core/Opcode.cpp
@@ -78,6 +78,44 @@ lldb::ByteOrder Opcode::GetDataByteOrder() const {
   return eByteOrderInvalid;
 }
 
+// make RISC-V byte dumps look like llvm-objdump, instead of just dumping bytes
+int Opcode::DumpRISCV(Stream *s, uint32_t min_byte_width) {
+  const uint32_t previous_bytes = s->GetWrittenBytes();
+  // if m_type is not bytes, call Dump
+  if (m_type != Opcode::eTypeBytes)
+    return Dump(s, min_byte_width);
+
+  // from RISCVPrettyPrinter in llvm-objdump.cpp
+  // if size % 4 == 0, print as 1 or 2 32 bit values (32 or 64 bit inst)
+  // else if size % 2 == 0, print as 1 or 3 16 bit values (16 or 48 bit inst)
+  // else fall back and print bytes
+  for (uint32_t i = 0; i < m_data.inst.length;) {
+    if (i > 0)
+      s->PutChar(' ');
+    if (!(m_data.inst.length % 4)) {
+      s->Printf("%2.2x%2.2x%2.2x%2.2x", m_data.inst.bytes[i + 3],
+                                        m_data.inst.bytes[i + 2],
+                                        m_data.inst.bytes[i + 1],
+                                        m_data.inst.bytes[i + 0]);
+      i += 4;
+    } else if (!(m_data.inst.length % 2)) {
+      s->Printf("%2.2x%2.2x", m_data.inst.bytes[i + 1],
+                              m_data.inst.bytes[i + 0]);
+      i += 2;
+    } else {
+      s->Printf("%2.2x", m_data.inst.bytes[i]);
+      ++i;
+    }
+  }
+
+  uint32_t bytes_written_so_far = s->GetWrittenBytes() - previous_bytes;
+  // Add spaces to make sure bytes display comes out even in case opcodes 
aren't
+  // all the same size.
+  if (bytes_written_so_far < min_byte_width)
+    s->Printf("%*s", min_byte_width - bytes_written_so_far, "");
+  return s->GetWrittenBytes() - previous_bytes;
+}
+
 uint32_t Opcode::GetData(DataExtractor &data) const {
   uint32_t byte_size = GetByteSize();
   uint8_t swap_buf[8];
diff --git a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp 
b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
index ed6047f8f4ef3..eeb6020abd73a 100644
--- a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
+++ b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
@@ -61,6 +61,8 @@ class DisassemblerLLVMC::MCDisasmInstance {
 
   uint64_t GetMCInst(const uint8_t *opcode_data, size_t opcode_data_len,
                      lldb::addr_t pc, llvm::MCInst &mc_inst) const;
+  bool GetMCInst(const uint8_t *opcode_data, size_t opcode_data_len,
+                 lldb::addr_t pc, llvm::MCInst &mc_inst, size_t &size) const;
   void PrintMCInst(llvm::MCInst &mc_inst, lldb::addr_t pc,
                    std::string &inst_string, std::string &comments_string);
   void SetStyle(bool use_hex_immed, HexImmediateStyle hex_style);
@@ -524,11 +526,11 @@ class InstructionLLVMC : public lldb_private::Instruction 
{
           const addr_t pc = m_address.GetFileAddress();
           llvm::MCInst inst;
 
-          const size_t inst_size =
-              mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
-          if (inst_size == 0)
-            m_opcode.Clear();
-          else {
+          size_t inst_size = 0;
+          m_is_valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len,
+                                                pc, inst, inst_size);
+          m_opcode.Clear();
+          if (inst_size != 0) {
             m_opcode.SetOpcodeBytes(opcode_data, inst_size);
             m_is_valid = true;
           }
@@ -604,10 +606,11 @@ class InstructionLLVMC : public lldb_private::Instruction 
{
         const uint8_t *opcode_data = data.GetDataStart();
         const size_t opcode_data_len = data.GetByteSize();
         llvm::MCInst inst;
-        size_t inst_size =
-            mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
-
-        if (inst_size > 0) {
+        size_t inst_size = 0;
+        bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc,
+                                             inst, inst_size);
+ 
+        if (valid && inst_size > 0) {
           mc_disasm_ptr->SetStyle(use_hex_immediates, hex_style);
 
           const bool saved_use_color = mc_disasm_ptr->GetUseColor();
@@ -1206,9 +1209,10 @@ class InstructionLLVMC : public 
lldb_private::Instruction {
     const uint8_t *opcode_data = data.GetDataStart();
     const size_t opcode_data_len = data.GetByteSize();
     llvm::MCInst inst;
-    const size_t inst_size =
-        mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
-    if (inst_size == 0)
+    size_t inst_size = 0;
+    const bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len,
+                                                pc, inst, inst_size);
+    if (!valid)
       return;
 
     m_has_visited_instruction = true;
@@ -1337,19 +1341,18 @@ DisassemblerLLVMC::MCDisasmInstance::MCDisasmInstance(
          m_asm_info_up && m_context_up && m_disasm_up && m_instr_printer_up);
 }
 
-uint64_t DisassemblerLLVMC::MCDisasmInstance::GetMCInst(
+bool DisassemblerLLVMC::MCDisasmInstance::GetMCInst(
     const uint8_t *opcode_data, size_t opcode_data_len, lldb::addr_t pc,
-    llvm::MCInst &mc_inst) const {
+    llvm::MCInst &mc_inst, size_t &size) const {
   llvm::ArrayRef<uint8_t> data(opcode_data, opcode_data_len);
   llvm::MCDisassembler::DecodeStatus status;
 
-  uint64_t new_inst_size;
-  status = m_disasm_up->getInstruction(mc_inst, new_inst_size, data, pc,
+  status = m_disasm_up->getInstruction(mc_inst, size, data, pc,
                                        llvm::nulls());
   if (status == llvm::MCDisassembler::Success)
-    return new_inst_size;
+    return true;
   else
-    return 0;
+    return false;
 }
 
 void DisassemblerLLVMC::MCDisasmInstance::PrintMCInst(
diff --git a/lldb/source/Utility/ArchSpec.cpp b/lldb/source/Utility/ArchSpec.cpp
index 70b9800f4dade..7c71aaae6bcf2 100644
--- a/lldb/source/Utility/ArchSpec.cpp
+++ b/lldb/source/Utility/ArchSpec.cpp
@@ -228,9 +228,9 @@ static const CoreDefinition g_core_definitions[] = {
     {eByteOrderLittle, 4, 4, 4, llvm::Triple::hexagon,
      ArchSpec::eCore_hexagon_hexagonv5, "hexagonv5"},
 
-    {eByteOrderLittle, 4, 2, 4, llvm::Triple::riscv32, ArchSpec::eCore_riscv32,
+    {eByteOrderLittle, 4, 2, 8, llvm::Triple::riscv32, ArchSpec::eCore_riscv32,
      "riscv32"},
-    {eByteOrderLittle, 8, 2, 4, llvm::Triple::riscv64, ArchSpec::eCore_riscv64,
+    {eByteOrderLittle, 8, 2, 8, llvm::Triple::riscv64, ArchSpec::eCore_riscv64,
      "riscv64"},
 
     {eByteOrderLittle, 4, 4, 4, llvm::Triple::loongarch32,

>From eee204836005d45adc5ffdd41fe4d61db585006a Mon Sep 17 00:00:00 2001
From: Ted Woodward <tedw...@quicinc.com>
Date: Wed, 25 Jun 2025 14:22:28 -0700
Subject: [PATCH 2/9] [lldb] Improve disassembly of unknown instructions

LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.

This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.

The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
---
 lldb/examples/python/filter_disasm.py         |  4 +--
 lldb/source/Core/Disassembler.cpp             |  7 +++--
 lldb/source/Core/Opcode.cpp                   |  8 ++---
 lldb/test/Shell/Commands/Inputs/dis_filt.sh   |  5 ++++
 .../command-disassemble-riscv32-bytes.s       | 30 +++++++++++++++++++
 .../Commands/command-disassemble-x86-bytes.s  | 28 +++++++++++++++++
 llvm/docs/ReleaseNotes.md                     |  3 ++
 7 files changed, 76 insertions(+), 9 deletions(-)
 create mode 100755 lldb/test/Shell/Commands/Inputs/dis_filt.sh
 create mode 100644 lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s
 create mode 100644 lldb/test/Shell/Commands/command-disassemble-x86-bytes.s

diff --git a/lldb/examples/python/filter_disasm.py 
b/lldb/examples/python/filter_disasm.py
index adb3455209055..d0ce609a99dd7 100644
--- a/lldb/examples/python/filter_disasm.py
+++ b/lldb/examples/python/filter_disasm.py
@@ -20,7 +20,7 @@ def __lldb_init_module(debugger, dict):
     print("Filter program set to %s" % filter_program)
 
 
-def fdis(debugger, args, result, dict):
+def fdis(debugger, args, exe_ctx, result, dict):
     """
   Call the built in disassembler, then pass its output to a filter program
   to add in disassembly for hidden opcodes.
@@ -62,7 +62,7 @@ def fdis(debugger, args, result, dict):
         return
 
     res = lldb.SBCommandReturnObject()
-    debugger.GetCommandInterpreter().HandleCommand('disassemble -b ' + args, 
res)
+    debugger.GetCommandInterpreter().HandleCommand('disassemble -b ' + args, 
exe_ctx, res)
     if (len(res.GetError()) > 0):
         result.SetError(res.GetError())
         result.SetStatus(lldb.eReturnStatusFailed)
diff --git a/lldb/source/Core/Disassembler.cpp 
b/lldb/source/Core/Disassembler.cpp
index f95e446448036..5ee3fc628478e 100644
--- a/lldb/source/Core/Disassembler.cpp
+++ b/lldb/source/Core/Disassembler.cpp
@@ -653,6 +653,7 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t 
max_opcode_byte_size,
   }
 
   if (show_bytes) {
+    auto max_byte_width = max_opcode_byte_size * 3 + 1;
     if (m_opcode.GetType() == Opcode::eTypeBytes) {
       // x86_64 and i386 are the only ones that use bytes right now so pad out
       // the byte dump to be able to always show 15 bytes (3 chars each) plus a
@@ -661,16 +662,16 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t 
max_opcode_byte_size,
         // make RISC-V opcode dump look like llvm-objdump
         if (exe_ctx &&
             exe_ctx->GetTargetSP()->GetArchitecture().GetTriple().isRISCV())
-          m_opcode.DumpRISCV(&ss, max_opcode_byte_size * 3 + 1);
+          m_opcode.DumpRISCV(&ss, max_byte_width);
         else
-          m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
+          m_opcode.Dump(&ss, max_byte_width);
        else
         m_opcode.Dump(&ss, 15 * 3 + 1);
     } else {
       // Else, we have ARM or MIPS which can show up to a uint32_t 0x00000000
       // (10 spaces) plus two for padding...
       if (max_opcode_byte_size > 0)
-        m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
+        m_opcode.Dump(&ss, max_byte_width);
       else
         m_opcode.Dump(&ss, 12);
     }
diff --git a/lldb/source/Core/Opcode.cpp b/lldb/source/Core/Opcode.cpp
index dbcd18cc0d8d2..17b4f2d30e6c4 100644
--- a/lldb/source/Core/Opcode.cpp
+++ b/lldb/source/Core/Opcode.cpp
@@ -85,23 +85,23 @@ int Opcode::DumpRISCV(Stream *s, uint32_t min_byte_width) {
   if (m_type != Opcode::eTypeBytes)
     return Dump(s, min_byte_width);
 
-  // from RISCVPrettyPrinter in llvm-objdump.cpp
-  // if size % 4 == 0, print as 1 or 2 32 bit values (32 or 64 bit inst)
-  // else if size % 2 == 0, print as 1 or 3 16 bit values (16 or 48 bit inst)
-  // else fall back and print bytes
+  // Logic taken from from RISCVPrettyPrinter in llvm-objdump.cpp
   for (uint32_t i = 0; i < m_data.inst.length;) {
     if (i > 0)
       s->PutChar(' ');
+    // if size % 4 == 0, print as 1 or 2 32 bit values (32 or 64 bit inst)
     if (!(m_data.inst.length % 4)) {
       s->Printf("%2.2x%2.2x%2.2x%2.2x", m_data.inst.bytes[i + 3],
                                         m_data.inst.bytes[i + 2],
                                         m_data.inst.bytes[i + 1],
                                         m_data.inst.bytes[i + 0]);
       i += 4;
+    // else if size % 2 == 0, print as 1 or 3 16 bit values (16 or 48 bit inst)
     } else if (!(m_data.inst.length % 2)) {
       s->Printf("%2.2x%2.2x", m_data.inst.bytes[i + 1],
                               m_data.inst.bytes[i + 0]);
       i += 2;
+    // else fall back and print bytes
     } else {
       s->Printf("%2.2x", m_data.inst.bytes[i]);
       ++i;
diff --git a/lldb/test/Shell/Commands/Inputs/dis_filt.sh 
b/lldb/test/Shell/Commands/Inputs/dis_filt.sh
new file mode 100755
index 0000000000000..5fb4e9386461f
--- /dev/null
+++ b/lldb/test/Shell/Commands/Inputs/dis_filt.sh
@@ -0,0 +1,5 @@
+#! /bin/sh
+
+echo "Fake filter start"
+cat
+echo "Fake filter end"
diff --git a/lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s 
b/lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s
new file mode 100644
index 0000000000000..28848b6f458f6
--- /dev/null
+++ b/lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s
@@ -0,0 +1,30 @@
+# REQUIRES: riscv
+
+# This test verifies that disassemble -b prints out the correct bytes and
+# format for standard and unknown riscv instructions of various sizes,
+# and that unknown instructions show opcodes and disassemble as "<unknown>".
+# It also tests that the fdis command from examples/python/filter_disasm.py
+# pipes the disassembly output through a simple filter program correctly.
+
+
+# RUN: llvm-mc -filetype=obj -mattr=+c --triple=riscv32-unknown-unknown %s -o 
%t
+# RUN: %lldb -b %t -o "command script import 
%S/../../../examples/python/filter_disasm.py" -o "fdis set 
%S/Inputs/dis_filt.sh" -o "fdis -n main" | FileCheck %s
+
+main:
+    addi   sp, sp, -0x20               # 16 bit standard instruction
+    sw     a0, -0xc(s0)                # 32 bit standard instruction
+    .insn 8, 0x2000200940003F;         # 64 bit custom instruction
+    .insn 6, 0x021F | 0x00001000 << 32 # 48 bit xqci.e.li rd=8 imm=0x1000
+    .insn 4, 0x84F940B                 # 32 bit xqci.insbi  
+    .insn 2, 0xB8F2                    # 16 bit cm.push
+
+# CHECK: Disassembly filter command (fdis) loaded
+# CHECK: Fake filter start
+# CHECK: [0x0] <+0>:   1101                     addi   sp, sp, -0x20 
+# CHECK: [0x2] <+2>:   fea42a23                 sw     a0, -0xc(s0)
+# CHECK: [0x6] <+6>:   0940003f 00200020        <unknown>
+# CHECK: [0xe] <+14>:  021f 0000 1000           <unknown>
+# CHECK: [0x14] <+20>: 084f940b                 <unknown>
+# CHECK: [0x18] <+24>: b8f2                     <unknown>
+# CHECK: Fake filter end
+
diff --git a/lldb/test/Shell/Commands/command-disassemble-x86-bytes.s 
b/lldb/test/Shell/Commands/command-disassemble-x86-bytes.s
new file mode 100644
index 0000000000000..c2e98a60316e2
--- /dev/null
+++ b/lldb/test/Shell/Commands/command-disassemble-x86-bytes.s
@@ -0,0 +1,28 @@
+# REQUIRES: x86
+
+# This test verifies that disassemble -b prints out the correct bytes and
+# format for x86_64 instructions of various sizes, and that an unknown
+# instruction shows the opcode and disassembles as "<unknown>"
+
+# RUN: llvm-mc -filetype=obj --triple=x86_64-unknown-unknown %s -o %t
+# RUN: %lldb -b %t -o "disassemble -b -n main" | FileCheck %s
+
+main:                                   # @main
+       subq   $0x18, %rsp
+       movl   $0x0, 0x14(%rsp)
+       movq   %rdx, 0x8(%rsp)
+       movl   %ecx, 0x4(%rsp)
+       movl   (%rsp), %eax
+        addq   $0x18, %rsp
+       retq
+        .byte  0x6 
+
+# CHECK: [0x0] <+0>:   48 83 ec 18              subq   $0x18, %rsp
+# CHECK: [0x4] <+4>:   c7 44 24 14 00 00 00 00  movl   $0x0, 0x14(%rsp)
+# CHECK: [0xc] <+12>:  48 89 54 24 08           movq   %rdx, 0x8(%rsp)
+# CHECK: [0x11] <+17>: 89 4c 24 04              movl   %ecx, 0x4(%rsp)
+# CHECK: [0x15] <+21>: 8b 04 24                 movl   (%rsp), %eax
+# CHECK: [0x18] <+24>: 48 83 c4 18              addq   $0x18, %rsp
+# CHECK: [0x1c] <+28>: c3                       retq
+# CHECK: [0x1d] <+29>: 06                       <unknown>
+
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 73ae2ee599640..672db712bd798 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -304,6 +304,9 @@ Changes to LLDB
     stop reason = SIGSEGV: sent by tkill system call (sender pid=649752, 
uid=2667987)
   ```
 * ELF Cores can now have their siginfo structures inspected using `thread 
siginfo`.
+* Changed invalid disassembly to say <unknown> instead of being blank.
+* Changed the format of opcode bytes to match llvm-objdump when disassembling
+  RISC-V with the -b option.
 
 ### Changes to lldb-dap
 

>From 2fe2115f8e58dc9f4a38804b0ac392eb6a152da8 Mon Sep 17 00:00:00 2001
From: Ted Woodward <tedw...@quicinc.com>
Date: Wed, 25 Jun 2025 14:22:28 -0700
Subject: [PATCH 3/9] [lldb] Improve disassembly of unknown instructions

LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.

This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.

The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
---
 lldb/examples/python/filter_disasm.py | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/lldb/examples/python/filter_disasm.py 
b/lldb/examples/python/filter_disasm.py
index d0ce609a99dd7..08d7e2861b560 100644
--- a/lldb/examples/python/filter_disasm.py
+++ b/lldb/examples/python/filter_disasm.py
@@ -77,11 +77,8 @@ def fdis(debugger, args, exe_ctx, result, dict):
         result.SetStatus(lldb.eReturnStatusFailed)
         return
 
-    print(proc.stderr)
-    if proc.stderr:
-        pass
-        #result.SetError(proc.stderr)
-        #result.SetStatus(lldb.eReturnStatusFailed)
-    else:
-        result.PutCString(proc.stdout)
-        result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
+    if proc.returncode:
+        result.PutCString("warning: {} returned non-zero value 
{}".format(filter_program, proc.returncode))
+
+    result.PutCString(proc.stdout)
+    result.SetStatus(lldb.eReturnStatusSuccessFinishResult)

>From 1ec2424aab0021746b477c75cba0b02d79adfa47 Mon Sep 17 00:00:00 2001
From: Ted Woodward <tedw...@quicinc.com>
Date: Wed, 25 Jun 2025 14:22:28 -0700
Subject: [PATCH 4/9] [lldb] Improve disassembly of unknown instructions

LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.

This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.

The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
---
 lldb/test/Shell/Commands/Inputs/dis_filt.py   |  8 ++++++
 lldb/test/Shell/Commands/Inputs/dis_filt.sh   |  5 ----
 .../command-disassemble-riscv32-bytes.s       | 26 ++++++++++++-------
 .../Commands/command-disassemble-x86-bytes.s  | 14 +++++-----
 llvm/docs/ReleaseNotes.md                     |  5 ++--
 5 files changed, 34 insertions(+), 24 deletions(-)
 create mode 100755 lldb/test/Shell/Commands/Inputs/dis_filt.py
 delete mode 100755 lldb/test/Shell/Commands/Inputs/dis_filt.sh

diff --git a/lldb/test/Shell/Commands/Inputs/dis_filt.py 
b/lldb/test/Shell/Commands/Inputs/dis_filt.py
new file mode 100755
index 0000000000000..72732bccc106c
--- /dev/null
+++ b/lldb/test/Shell/Commands/Inputs/dis_filt.py
@@ -0,0 +1,8 @@
+#! /usr/bin/env python
+
+import sys
+
+for line in sys.stdin:
+    if '0940003f 00200020' in line and '<unknown>' in line:
+        line = line.replace('<unknown>', 'Fake64')
+    print(line, end="")
diff --git a/lldb/test/Shell/Commands/Inputs/dis_filt.sh 
b/lldb/test/Shell/Commands/Inputs/dis_filt.sh
deleted file mode 100755
index 5fb4e9386461f..0000000000000
--- a/lldb/test/Shell/Commands/Inputs/dis_filt.sh
+++ /dev/null
@@ -1,5 +0,0 @@
-#! /bin/sh
-
-echo "Fake filter start"
-cat
-echo "Fake filter end"
diff --git a/lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s 
b/lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s
index 28848b6f458f6..01b9ba261d660 100644
--- a/lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s
+++ b/lldb/test/Shell/Commands/command-disassemble-riscv32-bytes.s
@@ -8,7 +8,8 @@
 
 
 # RUN: llvm-mc -filetype=obj -mattr=+c --triple=riscv32-unknown-unknown %s -o 
%t
-# RUN: %lldb -b %t -o "command script import 
%S/../../../examples/python/filter_disasm.py" -o "fdis set 
%S/Inputs/dis_filt.sh" -o "fdis -n main" | FileCheck %s
+# RUN: %lldb -b %t "-o" "disassemble -b -n main" | FileCheck %s
+# RUN: %lldb -b %t -o "command script import 
%S/../../../examples/python/filter_disasm.py" -o "fdis set 
%S/Inputs/dis_filt.py" -o "fdis -n main" | FileCheck --check-prefix=FILTER %s
 
 main:
     addi   sp, sp, -0x20               # 16 bit standard instruction
@@ -18,13 +19,18 @@ main:
     .insn 4, 0x84F940B                 # 32 bit xqci.insbi  
     .insn 2, 0xB8F2                    # 16 bit cm.push
 
-# CHECK: Disassembly filter command (fdis) loaded
-# CHECK: Fake filter start
-# CHECK: [0x0] <+0>:   1101                     addi   sp, sp, -0x20 
-# CHECK: [0x2] <+2>:   fea42a23                 sw     a0, -0xc(s0)
-# CHECK: [0x6] <+6>:   0940003f 00200020        <unknown>
-# CHECK: [0xe] <+14>:  021f 0000 1000           <unknown>
-# CHECK: [0x14] <+20>: 084f940b                 <unknown>
-# CHECK: [0x18] <+24>: b8f2                     <unknown>
-# CHECK: Fake filter end
+# CHECK:      [0x0] <+0>:   1101                     addi   sp, sp, -0x20 
+# CHECK-NEXT: [0x2] <+2>:   fea42a23                 sw     a0, -0xc(s0)
+# CHECK-NEXT: [0x6] <+6>:   0940003f 00200020        <unknown>
+# CHECK-NEXT: [0xe] <+14>:  021f 0000 1000           <unknown>
+# CHECK-NEXT: [0x14] <+20>: 084f940b                 <unknown>
+# CHECK-NEXT: [0x18] <+24>: b8f2                     <unknown>
+
+# FILTER: Disassembly filter command (fdis) loaded
+# FILTER:      [0x0] <+0>:   1101                     addi   sp, sp, -0x20 
+# FILTER-NEXT: [0x2] <+2>:   fea42a23                 sw     a0, -0xc(s0)
+# FILTER-NEXT: [0x6] <+6>:   0940003f 00200020        Fake64
+# FILTER-NEXT: [0xe] <+14>:  021f 0000 1000           <unknown>
+# FILTER-NEXT: [0x14] <+20>: 084f940b                 <unknown>
+# FILTER-NEXT: [0x18] <+24>: b8f2                     <unknown>
 
diff --git a/lldb/test/Shell/Commands/command-disassemble-x86-bytes.s 
b/lldb/test/Shell/Commands/command-disassemble-x86-bytes.s
index c2e98a60316e2..fae08d09a0832 100644
--- a/lldb/test/Shell/Commands/command-disassemble-x86-bytes.s
+++ b/lldb/test/Shell/Commands/command-disassemble-x86-bytes.s
@@ -18,11 +18,11 @@ main:                                   # @main
         .byte  0x6 
 
 # CHECK: [0x0] <+0>:   48 83 ec 18              subq   $0x18, %rsp
-# CHECK: [0x4] <+4>:   c7 44 24 14 00 00 00 00  movl   $0x0, 0x14(%rsp)
-# CHECK: [0xc] <+12>:  48 89 54 24 08           movq   %rdx, 0x8(%rsp)
-# CHECK: [0x11] <+17>: 89 4c 24 04              movl   %ecx, 0x4(%rsp)
-# CHECK: [0x15] <+21>: 8b 04 24                 movl   (%rsp), %eax
-# CHECK: [0x18] <+24>: 48 83 c4 18              addq   $0x18, %rsp
-# CHECK: [0x1c] <+28>: c3                       retq
-# CHECK: [0x1d] <+29>: 06                       <unknown>
+# CHECK-NEXT: [0x4] <+4>:   c7 44 24 14 00 00 00 00  movl   $0x0, 0x14(%rsp)
+# CHECK-NEXT: [0xc] <+12>:  48 89 54 24 08           movq   %rdx, 0x8(%rsp)
+# CHECK-NEXT: [0x11] <+17>: 89 4c 24 04              movl   %ecx, 0x4(%rsp)
+# CHECK-NEXT: [0x15] <+21>: 8b 04 24                 movl   (%rsp), %eax
+# CHECK-NEXT: [0x18] <+24>: 48 83 c4 18              addq   $0x18, %rsp
+# CHECK-NEXT: [0x1c] <+28>: c3                       retq
+# CHECK-NEXT: [0x1d] <+29>: 06                       <unknown>
 
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 672db712bd798..80790f631a2fe 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -304,9 +304,10 @@ Changes to LLDB
     stop reason = SIGSEGV: sent by tkill system call (sender pid=649752, 
uid=2667987)
   ```
 * ELF Cores can now have their siginfo structures inspected using `thread 
siginfo`.
-* Changed invalid disassembly to say <unknown> instead of being blank.
+* Disassembly of unknown instructions now produces "<unknown>" instead of
+  nothing at all
 * Changed the format of opcode bytes to match llvm-objdump when disassembling
-  RISC-V with the -b option.
+  RISC-V code with disassemble's --byte option.
 
 ### Changes to lldb-dap
 

>From 2966de6e439c0644de376db1f51582d8b03947f4 Mon Sep 17 00:00:00 2001
From: Ted Woodward <tedw...@quicinc.com>
Date: Wed, 25 Jun 2025 14:22:28 -0700
Subject: [PATCH 5/9] [lldb] Improve disassembly of unknown instructions

LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.

This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.

The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
---
 lldb/examples/python/filter_disasm.py                | 12 ++++++------
 lldb/source/Core/Disassembler.cpp                    |  2 +-
 .../Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp |  5 ++---
 lldb/test/Shell/Commands/Inputs/dis_filt.py          |  4 ++--
 4 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/lldb/examples/python/filter_disasm.py 
b/lldb/examples/python/filter_disasm.py
index 08d7e2861b560..74a73e292759b 100644
--- a/lldb/examples/python/filter_disasm.py
+++ b/lldb/examples/python/filter_disasm.py
@@ -13,9 +13,9 @@
 
 filter_program = "crustfilt"
 
+
 def __lldb_init_module(debugger, dict):
-    debugger.HandleCommand(
-        'command script add -f filter_disasm.fdis fdis')
+    debugger.HandleCommand("command script add -f filter_disasm.fdis fdis")
     print("Disassembly filter command (fdis) loaded")
     print("Filter program set to %s" % filter_program)
 
@@ -47,22 +47,22 @@ def fdis(debugger, args, exe_ctx, result, dict):
     """
 
     global filter_program
-    args_list = args.split(' ')
+    args_list = args.split(" ")
     result.Clear()
 
-    if len(args_list) == 1 and args_list[0] == 'get':
+    if len(args_list) == 1 and args_list[0] == "get":
         result.PutCString(filter_program)
         result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
         return
 
-    if len(args_list) == 2 and args_list[0] == 'set':
+    if len(args_list) == 2 and args_list[0] == "set":
         filter_program = args_list[1]
         result.PutCString("Filter program set to %s" % filter_program)
         result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
         return
 
     res = lldb.SBCommandReturnObject()
-    debugger.GetCommandInterpreter().HandleCommand('disassemble -b ' + args, 
exe_ctx, res)
+    debugger.GetCommandInterpreter().HandleCommand("disassemble -b " + args, 
exe_ctx, res)
     if (len(res.GetError()) > 0):
         result.SetError(res.GetError())
         result.SetStatus(lldb.eReturnStatusFailed)
diff --git a/lldb/source/Core/Disassembler.cpp 
b/lldb/source/Core/Disassembler.cpp
index 5ee3fc628478e..3c12312778d1b 100644
--- a/lldb/source/Core/Disassembler.cpp
+++ b/lldb/source/Core/Disassembler.cpp
@@ -665,7 +665,7 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t 
max_opcode_byte_size,
           m_opcode.DumpRISCV(&ss, max_byte_width);
         else
           m_opcode.Dump(&ss, max_byte_width);
-       else
+      else
         m_opcode.Dump(&ss, 15 * 3 + 1);
     } else {
       // Else, we have ARM or MIPS which can show up to a uint32_t 0x00000000
diff --git a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp 
b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
index eeb6020abd73a..35d06491a4c3d 100644
--- a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
+++ b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
@@ -608,7 +608,7 @@ class InstructionLLVMC : public lldb_private::Instruction {
         llvm::MCInst inst;
         size_t inst_size = 0;
         bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc,
-                                             inst, inst_size);
+                                              inst, inst_size);
  
         if (valid && inst_size > 0) {
           mc_disasm_ptr->SetStyle(use_hex_immediates, hex_style);
@@ -1347,8 +1347,7 @@ bool DisassemblerLLVMC::MCDisasmInstance::GetMCInst(
   llvm::ArrayRef<uint8_t> data(opcode_data, opcode_data_len);
   llvm::MCDisassembler::DecodeStatus status;
 
-  status = m_disasm_up->getInstruction(mc_inst, size, data, pc,
-                                       llvm::nulls());
+  status = m_disasm_up->getInstruction(mc_inst, size, data, pc, llvm::nulls());
   if (status == llvm::MCDisassembler::Success)
     return true;
   else
diff --git a/lldb/test/Shell/Commands/Inputs/dis_filt.py 
b/lldb/test/Shell/Commands/Inputs/dis_filt.py
index 72732bccc106c..21e56ae438392 100755
--- a/lldb/test/Shell/Commands/Inputs/dis_filt.py
+++ b/lldb/test/Shell/Commands/Inputs/dis_filt.py
@@ -3,6 +3,6 @@
 import sys
 
 for line in sys.stdin:
-    if '0940003f 00200020' in line and '<unknown>' in line:
-        line = line.replace('<unknown>', 'Fake64')
+    if "0940003f 00200020" in line and "<unknown>" in line:
+        line = line.replace("<unknown>", "Fake64")
     print(line, end="")

>From afedb0f013b295962e26635a7e710534e0c93c1f Mon Sep 17 00:00:00 2001
From: Ted Woodward <tedw...@quicinc.com>
Date: Wed, 25 Jun 2025 14:22:28 -0700
Subject: [PATCH 6/9] [lldb] Improve disassembly of unknown instructions

LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.

This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.

The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
---
 lldb/examples/python/filter_disasm.py       | 2 +-
 lldb/test/Shell/Commands/Inputs/dis_filt.py | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lldb/examples/python/filter_disasm.py 
b/lldb/examples/python/filter_disasm.py
index 74a73e292759b..de99d4031a7fd 100644
--- a/lldb/examples/python/filter_disasm.py
+++ b/lldb/examples/python/filter_disasm.py
@@ -63,7 +63,7 @@ def fdis(debugger, args, exe_ctx, result, dict):
 
     res = lldb.SBCommandReturnObject()
     debugger.GetCommandInterpreter().HandleCommand("disassemble -b " + args, 
exe_ctx, res)
-    if (len(res.GetError()) > 0):
+    if len(res.GetError()) > 0:
         result.SetError(res.GetError())
         result.SetStatus(lldb.eReturnStatusFailed)
         return
diff --git a/lldb/test/Shell/Commands/Inputs/dis_filt.py 
b/lldb/test/Shell/Commands/Inputs/dis_filt.py
index 21e56ae438392..bac5a36be2f3c 100755
--- a/lldb/test/Shell/Commands/Inputs/dis_filt.py
+++ b/lldb/test/Shell/Commands/Inputs/dis_filt.py
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#! /usr/bin/env python3
 
 import sys
 

>From 9cf0fa3894608300fe040132f80ce26ed4be7cac Mon Sep 17 00:00:00 2001
From: Ted Woodward <tedw...@quicinc.com>
Date: Wed, 25 Jun 2025 14:22:28 -0700
Subject: [PATCH 7/9] [lldb] Improve disassembly of unknown instructions

LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.

This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.

The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Code for the new Opcode Type eType16_32Tuples by Jason Molenda.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

- Added disassembly byte test for x86 with known and unknown instructions.
  Added disassembly byte test for riscv32 with known and unknown instructions,
  with and without filtering.
  Added test from Jason Molenda to RISC-V disassembly unit tests.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
---
 lldb/include/lldb/Core/Opcode.h               | 40 ++++++++---
 lldb/source/Core/Disassembler.cpp             | 10 +--
 lldb/source/Core/Opcode.cpp                   | 66 ++++++++-----------
 .../Disassembler/LLVMC/DisassemblerLLVMC.cpp  | 16 +++--
 .../RISCV/TestMCDisasmInstanceRISCV.cpp       | 61 +++++++++++++++++
 5 files changed, 134 insertions(+), 59 deletions(-)

diff --git a/lldb/include/lldb/Core/Opcode.h b/lldb/include/lldb/Core/Opcode.h
index 88ef17093d3f3..91af15c62e6ab 100644
--- a/lldb/include/lldb/Core/Opcode.h
+++ b/lldb/include/lldb/Core/Opcode.h
@@ -32,7 +32,10 @@ class Opcode {
     eTypeInvalid,
     eType8,
     eType16,
-    eType16_2, // a 32-bit Thumb instruction, made up of two words
+    eType16_2,        // a 32-bit Thumb instruction, made up of two words
+    eType16_32Tuples, // RISC-V that can have 2, 4, 6, 8 etc byte long
+                      // instructions which will be printed in combinations of
+                      // 16 & 32-bit words.
     eType32,
     eType64,
     eTypeBytes
@@ -60,9 +63,9 @@ class Opcode {
     m_data.inst64 = inst;
   }
 
-  Opcode(uint8_t *bytes, size_t length)
-      : m_byte_order(lldb::eByteOrderInvalid) {
-    SetOpcodeBytes(bytes, length);
+  Opcode(uint8_t *bytes, size_t length, Opcode::Type type,
+         lldb::ByteOrder order) {
+    DoSetOpcodeBytes(bytes, length, type, order);
   }
 
   void Clear() {
@@ -82,6 +85,8 @@ class Opcode {
       break;
     case Opcode::eType16_2:
       break;
+    case Opcode::eType16_32Tuples:
+      break;
     case Opcode::eType32:
       break;
     case Opcode::eType64:
@@ -103,6 +108,8 @@ class Opcode {
                              : m_data.inst16;
     case Opcode::eType16_2:
       break;
+    case Opcode::eType16_32Tuples:
+      break;
     case Opcode::eType32:
       break;
     case Opcode::eType64:
@@ -122,6 +129,8 @@ class Opcode {
     case Opcode::eType16:
       return GetEndianSwap() ? llvm::byteswap<uint16_t>(m_data.inst16)
                              : m_data.inst16;
+    case Opcode::eType16_32Tuples:
+      break;
     case Opcode::eType16_2: // passthrough
     case Opcode::eType32:
       return GetEndianSwap() ? llvm::byteswap<uint32_t>(m_data.inst32)
@@ -143,6 +152,8 @@ class Opcode {
     case Opcode::eType16:
       return GetEndianSwap() ? llvm::byteswap<uint16_t>(m_data.inst16)
                              : m_data.inst16;
+    case Opcode::eType16_32Tuples:
+      break;
     case Opcode::eType16_2: // passthrough
     case Opcode::eType32:
       return GetEndianSwap() ? llvm::byteswap<uint32_t>(m_data.inst32)
@@ -186,21 +197,30 @@ class Opcode {
     m_byte_order = order;
   }
 
+  void SetOpcode16_32TupleBytes(const void *bytes, size_t length,
+                                lldb::ByteOrder order) {
+    DoSetOpcodeBytes(bytes, length, eType16_32Tuples, order);
+  }
+
   void SetOpcodeBytes(const void *bytes, size_t length) {
+    DoSetOpcodeBytes(bytes, length, eTypeBytes, lldb::eByteOrderInvalid);
+  }
+
+  void DoSetOpcodeBytes(const void *bytes, size_t length, Opcode::Type type,
+                        lldb::ByteOrder order) {
     if (bytes != nullptr && length > 0) {
-      m_type = eTypeBytes;
+      m_type = type;
       m_data.inst.length = length;
       assert(length < sizeof(m_data.inst.bytes));
       memcpy(m_data.inst.bytes, bytes, length);
-      m_byte_order = lldb::eByteOrderInvalid;
+      m_byte_order = order;
     } else {
       m_type = eTypeInvalid;
       m_data.inst.length = 0;
     }
   }
 
-  int Dump(Stream *s, uint32_t min_byte_width);
-  int DumpRISCV(Stream *s, uint32_t min_byte_width);
+  int Dump(Stream *s, uint32_t min_byte_width) const;
 
   const void *GetOpcodeBytes() const {
     return ((m_type == Opcode::eTypeBytes) ? m_data.inst.bytes : nullptr);
@@ -214,6 +234,8 @@ class Opcode {
       return sizeof(m_data.inst8);
     case Opcode::eType16:
       return sizeof(m_data.inst16);
+    case Opcode::eType16_32Tuples:
+      return m_data.inst.length;
     case Opcode::eType16_2: // passthrough
     case Opcode::eType32:
       return sizeof(m_data.inst32);
@@ -239,6 +261,8 @@ class Opcode {
       return &m_data.inst8;
     case Opcode::eType16:
       return &m_data.inst16;
+    case Opcode::eType16_32Tuples:
+      return m_data.inst.bytes;
     case Opcode::eType16_2: // passthrough
     case Opcode::eType32:
       return &m_data.inst32;
diff --git a/lldb/source/Core/Disassembler.cpp 
b/lldb/source/Core/Disassembler.cpp
index 3c12312778d1b..c65c5efe55657 100644
--- a/lldb/source/Core/Disassembler.cpp
+++ b/lldb/source/Core/Disassembler.cpp
@@ -653,25 +653,19 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t 
max_opcode_byte_size,
   }
 
   if (show_bytes) {
-    auto max_byte_width = max_opcode_byte_size * 3 + 1;
     if (m_opcode.GetType() == Opcode::eTypeBytes) {
       // x86_64 and i386 are the only ones that use bytes right now so pad out
       // the byte dump to be able to always show 15 bytes (3 chars each) plus a
       // space
       if (max_opcode_byte_size > 0)
-        // make RISC-V opcode dump look like llvm-objdump
-        if (exe_ctx &&
-            exe_ctx->GetTargetSP()->GetArchitecture().GetTriple().isRISCV())
-          m_opcode.DumpRISCV(&ss, max_byte_width);
-        else
-          m_opcode.Dump(&ss, max_byte_width);
+        m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
       else
         m_opcode.Dump(&ss, 15 * 3 + 1);
     } else {
       // Else, we have ARM or MIPS which can show up to a uint32_t 0x00000000
       // (10 spaces) plus two for padding...
       if (max_opcode_byte_size > 0)
-        m_opcode.Dump(&ss, max_byte_width);
+        m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
       else
         m_opcode.Dump(&ss, 12);
     }
diff --git a/lldb/source/Core/Opcode.cpp b/lldb/source/Core/Opcode.cpp
index 17b4f2d30e6c4..97b938f8d919b 100644
--- a/lldb/source/Core/Opcode.cpp
+++ b/lldb/source/Core/Opcode.cpp
@@ -21,7 +21,7 @@
 using namespace lldb;
 using namespace lldb_private;
 
-int Opcode::Dump(Stream *s, uint32_t min_byte_width) {
+int Opcode::Dump(Stream *s, uint32_t min_byte_width) const {
   const uint32_t previous_bytes = s->GetWrittenBytes();
   switch (m_type) {
   case Opcode::eTypeInvalid:
@@ -38,6 +38,28 @@ int Opcode::Dump(Stream *s, uint32_t min_byte_width) {
     s->Printf("0x%8.8x", m_data.inst32);
     break;
 
+  case Opcode::eType16_32Tuples: {
+    const bool format_as_words = (m_data.inst.length % 4) == 0;
+    uint32_t i = 0;
+    while (i < m_data.inst.length) {
+      if (i > 0)
+        s->PutChar(' ');
+      if (format_as_words) {
+        // Format as words; print 1 or more UInt32 values.
+        s->Printf("%2.2x%2.2x%2.2x%2.2x", m_data.inst.bytes[i + 3],
+                                          m_data.inst.bytes[i + 2],
+                                          m_data.inst.bytes[i + 1],
+                                          m_data.inst.bytes[i + 0]);
+        i += 4;
+      } else {
+        // Format as halfwords; print 1 or more UInt16 values.
+        s->Printf("%2.2x%2.2x", m_data.inst.bytes[i + 1],
+                                m_data.inst.bytes[i + 0]);
+        i += 2;
+      }
+    }
+  } break;
+
   case Opcode::eType64:
     s->Printf("0x%16.16" PRIx64, m_data.inst64);
     break;
@@ -69,6 +91,7 @@ lldb::ByteOrder Opcode::GetDataByteOrder() const {
   case Opcode::eType8:
   case Opcode::eType16:
   case Opcode::eType16_2:
+  case Opcode::eType16_32Tuples:
   case Opcode::eType32:
   case Opcode::eType64:
     return endian::InlHostByteOrder();
@@ -78,44 +101,6 @@ lldb::ByteOrder Opcode::GetDataByteOrder() const {
   return eByteOrderInvalid;
 }
 
-// make RISC-V byte dumps look like llvm-objdump, instead of just dumping bytes
-int Opcode::DumpRISCV(Stream *s, uint32_t min_byte_width) {
-  const uint32_t previous_bytes = s->GetWrittenBytes();
-  // if m_type is not bytes, call Dump
-  if (m_type != Opcode::eTypeBytes)
-    return Dump(s, min_byte_width);
-
-  // Logic taken from from RISCVPrettyPrinter in llvm-objdump.cpp
-  for (uint32_t i = 0; i < m_data.inst.length;) {
-    if (i > 0)
-      s->PutChar(' ');
-    // if size % 4 == 0, print as 1 or 2 32 bit values (32 or 64 bit inst)
-    if (!(m_data.inst.length % 4)) {
-      s->Printf("%2.2x%2.2x%2.2x%2.2x", m_data.inst.bytes[i + 3],
-                                        m_data.inst.bytes[i + 2],
-                                        m_data.inst.bytes[i + 1],
-                                        m_data.inst.bytes[i + 0]);
-      i += 4;
-    // else if size % 2 == 0, print as 1 or 3 16 bit values (16 or 48 bit inst)
-    } else if (!(m_data.inst.length % 2)) {
-      s->Printf("%2.2x%2.2x", m_data.inst.bytes[i + 1],
-                              m_data.inst.bytes[i + 0]);
-      i += 2;
-    // else fall back and print bytes
-    } else {
-      s->Printf("%2.2x", m_data.inst.bytes[i]);
-      ++i;
-    }
-  }
-
-  uint32_t bytes_written_so_far = s->GetWrittenBytes() - previous_bytes;
-  // Add spaces to make sure bytes display comes out even in case opcodes 
aren't
-  // all the same size.
-  if (bytes_written_so_far < min_byte_width)
-    s->Printf("%*s", min_byte_width - bytes_written_so_far, "");
-  return s->GetWrittenBytes() - previous_bytes;
-}
-
 uint32_t Opcode::GetData(DataExtractor &data) const {
   uint32_t byte_size = GetByteSize();
   uint8_t swap_buf[8];
@@ -151,6 +136,9 @@ uint32_t Opcode::GetData(DataExtractor &data) const {
         swap_buf[3] = m_data.inst.bytes[2];
         buf = swap_buf;
         break;
+      case Opcode::eType16_32Tuples:
+        buf = GetOpcodeDataBytes();
+        break;
       case Opcode::eType32:
         *(uint32_t *)swap_buf = llvm::byteswap<uint32_t>(m_data.inst32);
         buf = swap_buf;
diff --git a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp 
b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
index 35d06491a4c3d..ebfdc6f2cb280 100644
--- a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
+++ b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
@@ -488,8 +488,13 @@ class InstructionLLVMC : public lldb_private::Instruction {
           break;
 
         default:
-          m_opcode.SetOpcodeBytes(data.PeekData(data_offset, min_op_byte_size),
-                                  min_op_byte_size);
+          if (arch.GetTriple().isRISCV())
+            m_opcode.SetOpcode16_32TupleBytes(
+                data.PeekData(data_offset, min_op_byte_size), min_op_byte_size,
+                byte_order);
+          else
+            m_opcode.SetOpcodeBytes(
+                data.PeekData(data_offset, min_op_byte_size), 
min_op_byte_size);
           got_op = true;
           break;
         }
@@ -531,8 +536,11 @@ class InstructionLLVMC : public lldb_private::Instruction {
                                                 pc, inst, inst_size);
           m_opcode.Clear();
           if (inst_size != 0) {
-            m_opcode.SetOpcodeBytes(opcode_data, inst_size);
-            m_is_valid = true;
+            if (arch.GetTriple().isRISCV())
+              m_opcode.SetOpcode16_32TupleBytes(opcode_data, inst_size,
+                                                byte_order);
+            else
+              m_opcode.SetOpcodeBytes(opcode_data, inst_size);
           }
         }
       }
diff --git a/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp 
b/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
index 8ec5d62a99ac5..bedda863a2ae9 100644
--- a/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
+++ b/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
@@ -14,6 +14,7 @@
 #include "lldb/Core/Disassembler.h"
 #include "lldb/Target/ExecutionContext.h"
 #include "lldb/Utility/ArchSpec.h"
+#include "lldb/Utility/StreamString.h"
 
 #include "Plugins/Disassembler/LLVMC/DisassemblerLLVMC.h"
 
@@ -90,3 +91,63 @@ TEST_F(TestMCDisasmInstanceRISCV, TestRISCV32Instruction) {
   EXPECT_FALSE(inst_sp->IsCall());
   EXPECT_TRUE(inst_sp->DoesBranch());
 }
+
+TEST_F(TestMCDisasmInstanceRISCV, TestOpcodeBytePrinter) {
+  ArchSpec arch("riscv32-*-linux");
+
+  const unsigned num_of_instructions = 7;
+  // clang-format off
+  uint8_t data[] = {
+      0x41, 0x11,             // addi   sp, sp, -0x10
+      0x06, 0xc6,             // sw     ra, 0xc(sp)
+      0x23, 0x2a, 0xa4, 0xfe, // sw     a0, -0xc(s0)
+      0x23, 0x28, 0xa4, 0xfe, // sw     a0, -0x10(s0)
+      0x22, 0x44,             // lw     s0, 0x8(sp)
+
+      0x3f, 0x00, 0x40, 0x09, // Fake 64-bit instruction
+      0x20, 0x00, 0x20, 0x00,
+
+      0x1f, 0x02,             // 48 bit xqci.e.li rd=8 imm=0x1000
+      0x00, 0x00, 
+      0x00, 0x10,
+  };
+  // clang-format on
+
+  // clang-format off
+  const char *expected_outputs[] = {
+    "1141",
+    "c606",
+    "fea42a23",
+    "fea42823",
+    "4422",
+    "0940003f 00200020",
+    "021f 0000 1000"
+  };
+  // clang-format on
+  const unsigned num_of_expected_outputs = sizeof(expected_outputs) / 
sizeof(char *);
+
+  EXPECT_EQ(num_of_instructions, num_of_expected_outputs);
+
+  DisassemblerSP disass_sp;
+  Address start_addr(0x100);
+  disass_sp = Disassembler::DisassembleBytes(
+      arch, nullptr, nullptr, nullptr, nullptr, start_addr, &data, 
sizeof(data),
+      num_of_instructions, false);
+
+  // If we failed to get a disassembler, we can assume it is because
+  // the llvm we linked against was not built with the riscv target,
+  // and we should skip these tests without marking anything as failing.
+  if (!disass_sp)
+    return;
+
+  const InstructionList inst_list(disass_sp->GetInstructionList());
+  EXPECT_EQ(num_of_instructions, inst_list.GetSize());
+
+  for (size_t i = 0; i < num_of_instructions; i++) {
+    InstructionSP inst_sp;
+    StreamString s;
+    inst_sp = inst_list.GetInstructionAtIndex(i);
+    inst_sp->GetOpcode().Dump(&s, 1);
+    ASSERT_STREQ(s.GetString().str().c_str(), expected_outputs[i]);
+  }
+}

>From 63f417ce498a20cc4186daa49ec0b6848c0dfa97 Mon Sep 17 00:00:00 2001
From: Ted Woodward <tedw...@quicinc.com>
Date: Wed, 25 Jun 2025 14:22:28 -0700
Subject: [PATCH 8/9] [lldb] Improve disassembly of unknown instructions

LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.

This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.

The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Code for the new Opcode Type eType16_32Tuples by Jason Molenda.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

- Added disassembly byte test for x86 with known and unknown instructions.
  Added disassembly byte test for riscv32 with known and unknown instructions,
  with and without filtering.
  Added test from Jason Molenda to RISC-V disassembly unit tests.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
---
 lldb/source/Core/Disassembler.cpp                        | 3 +--
 lldb/source/Core/Opcode.cpp                              | 7 +++----
 .../Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp     | 9 +++++----
 .../Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp     | 3 ++-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/lldb/source/Core/Disassembler.cpp 
b/lldb/source/Core/Disassembler.cpp
index c65c5efe55657..925de2a5c836c 100644
--- a/lldb/source/Core/Disassembler.cpp
+++ b/lldb/source/Core/Disassembler.cpp
@@ -685,8 +685,7 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t 
max_opcode_byte_size,
     }
   }
   const size_t opcode_pos = ss.GetSizeOfLastLine();
-  std::string &opcode_name =
-      show_color ? m_markup_opcode_name : m_opcode_name;
+  std::string &opcode_name = show_color ? m_markup_opcode_name : m_opcode_name;
   const std::string &mnemonics = show_color ? m_markup_mnemonics : m_mnemonics;
 
   if (opcode_name.empty())
diff --git a/lldb/source/Core/Opcode.cpp b/lldb/source/Core/Opcode.cpp
index 97b938f8d919b..6c9ced9c11230 100644
--- a/lldb/source/Core/Opcode.cpp
+++ b/lldb/source/Core/Opcode.cpp
@@ -47,14 +47,13 @@ int Opcode::Dump(Stream *s, uint32_t min_byte_width) const {
       if (format_as_words) {
         // Format as words; print 1 or more UInt32 values.
         s->Printf("%2.2x%2.2x%2.2x%2.2x", m_data.inst.bytes[i + 3],
-                                          m_data.inst.bytes[i + 2],
-                                          m_data.inst.bytes[i + 1],
-                                          m_data.inst.bytes[i + 0]);
+                  m_data.inst.bytes[i + 2], m_data.inst.bytes[i + 1],
+                  m_data.inst.bytes[i + 0]);
         i += 4;
       } else {
         // Format as halfwords; print 1 or more UInt16 values.
         s->Printf("%2.2x%2.2x", m_data.inst.bytes[i + 1],
-                                m_data.inst.bytes[i + 0]);
+                  m_data.inst.bytes[i + 0]);
         i += 2;
       }
     }
diff --git a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp 
b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
index ebfdc6f2cb280..99e841107cbc2 100644
--- a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
+++ b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
@@ -617,7 +617,7 @@ class InstructionLLVMC : public lldb_private::Instruction {
         size_t inst_size = 0;
         bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc,
                                               inst, inst_size);
- 
+
         if (valid && inst_size > 0) {
           mc_disasm_ptr->SetStyle(use_hex_immediates, hex_style);
 
@@ -1349,9 +1349,10 @@ DisassemblerLLVMC::MCDisasmInstance::MCDisasmInstance(
          m_asm_info_up && m_context_up && m_disasm_up && m_instr_printer_up);
 }
 
-bool DisassemblerLLVMC::MCDisasmInstance::GetMCInst(
-    const uint8_t *opcode_data, size_t opcode_data_len, lldb::addr_t pc,
-    llvm::MCInst &mc_inst, size_t &size) const {
+bool DisassemblerLLVMC::MCDisasmInstance::GetMCInst(const uint8_t 
*opcode_data,                                                     size_t 
opcode_data_len,
+                                                    lldb::addr_t pc,
+                                                    llvm::MCInst &mc_inst,
+                                                    size_t &size) const {
   llvm::ArrayRef<uint8_t> data(opcode_data, opcode_data_len);
   llvm::MCDisassembler::DecodeStatus status;
 
diff --git a/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp 
b/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
index bedda863a2ae9..3cdb3191a5828 100644
--- a/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
+++ b/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
@@ -124,7 +124,8 @@ TEST_F(TestMCDisasmInstanceRISCV, TestOpcodeBytePrinter) {
     "021f 0000 1000"
   };
   // clang-format on
-  const unsigned num_of_expected_outputs = sizeof(expected_outputs) / 
sizeof(char *);
+  const unsigned num_of_expected_outputs =
+      sizeof(expected_outputs) / sizeof(char *);
 
   EXPECT_EQ(num_of_instructions, num_of_expected_outputs);
 

>From c58c4da0664763b5332e38f61c7c2116210c6346 Mon Sep 17 00:00:00 2001
From: Ted Woodward <tedw...@quicinc.com>
Date: Wed, 25 Jun 2025 14:22:28 -0700
Subject: [PATCH 9/9] [lldb] Improve disassembly of unknown instructions

LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.

This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.

The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Code for the new Opcode Type eType16_32Tuples by Jason Molenda.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

- Added disassembly byte test for x86 with known and unknown instructions.
  Added disassembly byte test for riscv32 with known and unknown instructions,
  with and without filtering.
  Added test from Jason Molenda to RISC-V disassembly unit tests.

Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
---
 .../Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp | 12 ------------
 llvm/docs/ReleaseNotes.md                            |  2 +-
 2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp 
b/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
index 3cdb3191a5828..64177a2fac490 100644
--- a/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
+++ b/lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
@@ -61,12 +61,6 @@ TEST_F(TestMCDisasmInstanceRISCV, TestRISCV32Instruction) {
       arch, nullptr, nullptr, nullptr, nullptr, start_addr, &data, 
sizeof(data),
       num_of_instructions, false);
 
-  // If we failed to get a disassembler, we can assume it is because
-  // the llvm we linked against was not built with the riscv target,
-  // and we should skip these tests without marking anything as failing.
-  if (!disass_sp)
-    return;
-
   const InstructionList inst_list(disass_sp->GetInstructionList());
   EXPECT_EQ(num_of_instructions, inst_list.GetSize());
 
@@ -135,12 +129,6 @@ TEST_F(TestMCDisasmInstanceRISCV, TestOpcodeBytePrinter) {
       arch, nullptr, nullptr, nullptr, nullptr, start_addr, &data, 
sizeof(data),
       num_of_instructions, false);
 
-  // If we failed to get a disassembler, we can assume it is because
-  // the llvm we linked against was not built with the riscv target,
-  // and we should skip these tests without marking anything as failing.
-  if (!disass_sp)
-    return;
-
   const InstructionList inst_list(disass_sp->GetInstructionList());
   EXPECT_EQ(num_of_instructions, inst_list.GetSize());
 
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 80790f631a2fe..9fd28cef992b1 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -307,7 +307,7 @@ Changes to LLDB
 * Disassembly of unknown instructions now produces "<unknown>" instead of
   nothing at all
 * Changed the format of opcode bytes to match llvm-objdump when disassembling
-  RISC-V code with disassemble's --byte option.
+  RISC-V code with `disassemble`'s `--byte` option.
 
 ### Changes to lldb-dap
 

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [lldb] [llvm] [lldb] Support disassembling RISC-V proprietary instructions (PR #145793)

Reply via email to