[Lldb-commits] [lldb] [lldb] Change the implementation of Status to store an llvm::Error (NFC) (PR #106774)

2024-09-02 Thread Pavel Labath via lldb-commits


@@ -174,33 +233,91 @@ const char *Status::AsCString(const char 
*default_error_str) const {
 
 // Clear the error and any cached error string that it might contain.
 void Status::Clear() {
-  m_code = 0;
-  m_type = eErrorTypeInvalid;
-  m_string.clear();
+  if (m_error)
+LLDB_LOG_ERRORV(GetLog(LLDBLog::API), std::move(m_error),
+"dropping error {0}");
+  m_error = llvm::Error::success();
+  llvm::consumeError(std::move(m_error));
 }
 
 // Access the error value.
-Status::ValueType Status::GetError() const { return m_code; }
+Status::ValueType Status::GetError() const {
+  Status::ValueType result = 0;
+  llvm::visitErrors(m_error, [&](const llvm::ErrorInfoBase &error) {
+std::error_code ec = error.convertToErrorCode();
+if (ec.category() == std::generic_category() ||
+ec.category() == generic_category() ||
+ec.category() == expression_category())
+  result = ec.value();

labath wrote:

How about just returning `ec.value()` regardless of the category?

https://github.com/llvm/llvm-project/pull/106774
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Change the implementation of Status to store an llvm::Error (NFC) (PR #106774)

2024-09-02 Thread Pavel Labath via lldb-commits


@@ -96,16 +124,44 @@ Status Status::FromErrorStringWithFormat(const char 
*format, ...) {
   return Status(string);
 }
 
-llvm::Error Status::ToError() const {
-  if (Success())
-return llvm::Error::success();
-  if (m_type == ErrorType::eErrorTypePOSIX)
-return llvm::errorCodeToError(
-std::error_code(m_code, std::generic_category()));
-  return llvm::createStringError(AsCString());
+Status Status::FromExpressionError(lldb::ExpressionResults result,
+   std::string msg) {
+  return Status(llvm::make_error(
+  std::error_code(result, expression_category()), msg));
 }
 
-Status::~Status() = default;
+/// Creates a deep copy of all known errors and converts all other
+/// errors to a new llvm::StringError.
+static llvm::Error cloneError(llvm::Error &error) {
+  llvm::Error result = llvm::Error::success();
+  llvm::consumeError(std::move(result));
+  llvm::visitErrors(error, [&](const llvm::ErrorInfoBase &error) {
+if (error.isA())
+  result = llvm::make_error(error.convertToErrorCode());
+else if (error.isA())
+  result = llvm::make_error(error.convertToErrorCode());
+else if (error.isA())
+  result = llvm::make_error(error.convertToErrorCode(),
+ error.message());
+else
+  result =
+  llvm::createStringError(error.convertToErrorCode(), error.message());
+  });
+  return result;
+}
+
+Status Status::FromError(llvm::Error &&error) {
+  // Existing clients assume that the conversion to Status consumes
+  // and destroys the error. Use cloneError to convert all unnown
+  // errors to strings.
+  llvm::Error clone = cloneError(error);
+  llvm::consumeError(std::move(error));

labath wrote:

I'm still confused by this. Why isn't this just `return 
Status(std::move(error))` ?

https://github.com/llvm/llvm-project/pull/106774
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Change the implementation of Status to store an llvm::Error (NFC) (PR #106774)

2024-09-02 Thread Pavel Labath via lldb-commits


@@ -174,33 +233,91 @@ const char *Status::AsCString(const char 
*default_error_str) const {
 
 // Clear the error and any cached error string that it might contain.
 void Status::Clear() {
-  m_code = 0;
-  m_type = eErrorTypeInvalid;
-  m_string.clear();
+  if (m_error)
+LLDB_LOG_ERRORV(GetLog(LLDBLog::API), std::move(m_error),
+"dropping error {0}");
+  m_error = llvm::Error::success();
+  llvm::consumeError(std::move(m_error));
 }
 
 // Access the error value.
-Status::ValueType Status::GetError() const { return m_code; }
+Status::ValueType Status::GetError() const {
+  Status::ValueType result = 0;
+  llvm::visitErrors(m_error, [&](const llvm::ErrorInfoBase &error) {
+std::error_code ec = error.convertToErrorCode();
+if (ec.category() == std::generic_category() ||
+ec.category() == generic_category() ||
+ec.category() == expression_category())
+  result = ec.value();
+else
+  result = 0xff;

labath wrote:

A more insteresting case is what should this return in case of multiple errors? 
Right now, it will return the last code, but maybe this is a better case for a 
special error code?

https://github.com/llvm/llvm-project/pull/106774
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Change the implementation of Status to store an llvm::Error (NFC) (PR #106774)

2024-09-02 Thread Pavel Labath via lldb-commits

https://github.com/labath edited 
https://github.com/llvm/llvm-project/pull/106774
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][AIX] Taking Linux Host Info header's base for AIX (PR #106910)

2024-09-02 Thread Pavel Labath via lldb-commits

https://github.com/labath edited 
https://github.com/llvm/llvm-project/pull/106910
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][AIX] Taking Linux Host Info header's base for AIX (PR #106910)

2024-09-02 Thread Pavel Labath via lldb-commits

https://github.com/labath commented:

So, how different is AIX from linux? Should we be treating it as a flavour of 
linux. E.g., have HostInfoAIX derive from HostInfoLinux instead of copying it ?

https://github.com/llvm/llvm-project/pull/106910
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][AIX] Taking Linux Host Info header's base for AIX (PR #106910)

2024-09-02 Thread Pavel Labath via lldb-commits


@@ -38,6 +38,12 @@ endif()
 include(LLDBConfig)
 include(AddLLDB)
 
+# This has been added to keep the AIX build isolated for now.
+# It will need to be modified later.
+if (UNIX AND ${CMAKE_SYSTEM_NAME} MATCHES "AIX")
+  add_definitions("-D__AIX__")

labath wrote:

We shouldn't be defining a reserved identifier. Also, isn't there an existing 
macro that we could test use to check if we're on AIX?

https://github.com/llvm/llvm-project/pull/106910
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [LLDB] Finish implementing support for DW_FORM_data16 (PR #106799)

2024-09-02 Thread Pavel Labath via lldb-commits

labath wrote:

> Sadly, I haven't figured out a good way to test this.

`test/Shell/SymbolFile/DWARF/x86/DW_AT_const_value.s` tests various encodings 
of DW_AT_const_value.

It doesn't do DW_FORM_data16, most likely because it was not working at the 
time. You can add another case to the file to exercise this code path.

https://github.com/llvm/llvm-project/pull/106799
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Better matching of types in anonymous namespaces (PR #102111)

2024-09-01 Thread Pavel Labath via lldb-commits

https://github.com/labath closed 
https://github.com/llvm/llvm-project/pull/102111
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Better matching of types in anonymous namespaces (PR #102111)

2024-09-01 Thread Pavel Labath via lldb-commits

labath wrote:

No worries, I was out a large part of August anyway.

https://github.com/llvm/llvm-project/pull/102111
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Add Status::Detail to store expression evaluator diagnostics [… (PR #106442)

2024-08-29 Thread Pavel Labath via lldb-commits

labath wrote:

Not really (although this might work as well). What I meant was that, AIUI 
you're doing this so that you can plumb structured error information from one 
place (clang expression parser?) to another place (the thing which generates 
the expression errors -- I guess somewhere CommandObjectExpression?). That 
could be done by creating a (custom) llvm::Error in the first place, and 
consuming it in the second one *and* by making sure that all of the functions 
along that path don't lose this information by converting the error to a 
lldb_private::Status. I haven't looked at how hard much changes would that 
require, but I'm hoping that will be a single relatively well defined path that 
can be ported with reasonable effort. Even if there's e.g. a virtual function 
somewhere along the path, we don't necessarily have to port all implementations 
of that function immediately. We could just port the ones we need, and provide 
a shim so that the other implementations can remain unchanged...

https://github.com/llvm/llvm-project/pull/106442
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-29 Thread Pavel Labath via lldb-commits


@@ -96,16 +98,119 @@ def getArchSpec(self, architecture):
 """
 return ["ARCH=" + architecture] if architecture else []
 
-def getCCSpec(self, compiler):
+def getToolchainSpec(self, compiler):
 """
-Helper function to return the key-value string to specify the compiler
+Helper function to return the key-value strings to specify the 
toolchain
 used for the make system.
 """
 cc = compiler if compiler else None
 if not cc and configuration.compiler:
 cc = configuration.compiler
+
 if cc:
-return ['CC="%s"' % cc]
+exe_ext = ""
+if lldbplatformutil.getHostPlatform() == "windows":
+exe_ext = ".exe"
+
+cc = cc.strip()
+cc_path = pathlib.Path(cc)
+
+# We can get CC compiler string in the following formats:
+#  [] - such as 'xrun clang', 'xrun 
/usr/bin/clang' & etc
+#
+# Where  could contain the following parts:
+#   [.]   - sucn as 
'clang', 'clang.exe' ('clang-cl.exe'?)
+#   -[.]   - such as 
'armv7-linux-gnueabi-gcc'
+#   /[.]- such as 
'/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+#   /-[.]- such as 
'/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+
+cc_ext = cc_path.suffix
+# Compiler name without extension
+cc_name = cc_path.stem.split(" ")[-1]
+
+# A kind of compiler (canonical name): clang, gcc, cc & etc.
+cc_type = cc_name
+# A triple prefix of compiler name: gcc
+cc_prefix = ""
+if not "clang-cl" in cc_name and not "llvm-gcc" in cc_name:
+cc_name_parts = cc_name.split("-")
+cc_type = cc_name_parts[-1]
+if len(cc_name_parts) > 1:
+cc_prefix = "-".join(cc_name_parts[:-1]) + "-"
+
+# A kind of C++ compiler.
+cxx_types = {
+"icc": "icpc",
+"llvm-gcc": "llvm-g++",
+"gcc": "g++",
+"cc": "c++",
+"clang": "clang++",
+}
+cxx_type = cxx_types.get(cc_type, cc_type)
+
+cc_dir = cc_path.parent
+
+def getLlvmUtil(util_name):
+llvm_tools_dir = os.getenv("LLVM_TOOLS_DIR", cc_dir)
+return os.path.join(llvm_tools_dir, util_name + exe_ext)
+
+def getToolchainUtil(util_name):
+return cc_dir / (cc_prefix + util_name + cc_ext)
+
+cxx = getToolchainUtil(cxx_type)
+
+util_names = {
+"OBJCOPY": "objcopy",
+"STRIP": "objcopy",

labath wrote:

Can we get rid of this variable? I don't think it's very helpful. Even though 
`objcopy` supports (I think) all of the functionality of `strip`, their args 
are different, so I think this could be pretty confusing.

https://github.com/llvm/llvm-project/pull/102185
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-29 Thread Pavel Labath via lldb-commits

https://github.com/labath commented:

Looks pretty good now, but I still have some questions.

https://github.com/llvm/llvm-project/pull/102185
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-29 Thread Pavel Labath via lldb-commits


@@ -213,7 +229,7 @@ endif
 LIMIT_DEBUG_INFO_FLAGS =
 NO_LIMIT_DEBUG_INFO_FLAGS =
 MODULE_DEBUG_INFO_FLAGS =
-ifneq (,$(findstring clang,$(CC)))
+ifeq ($(CCC), clang)

labath wrote:

So the last `C` stands for "type" ? :)

Can we call it CC_TYPE/KIND (or similar) instead? It's not like we're using it 
that often...

https://github.com/llvm/llvm-project/pull/102185
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-29 Thread Pavel Labath via lldb-commits


@@ -96,16 +98,119 @@ def getArchSpec(self, architecture):
 """
 return ["ARCH=" + architecture] if architecture else []
 
-def getCCSpec(self, compiler):
+def getToolchainSpec(self, compiler):
 """
-Helper function to return the key-value string to specify the compiler
+Helper function to return the key-value strings to specify the 
toolchain
 used for the make system.
 """
 cc = compiler if compiler else None
 if not cc and configuration.compiler:
 cc = configuration.compiler
+
 if cc:

labath wrote:

```suggestion
if not cc:
  return []
```

.. and then unindent the rest of the code

https://github.com/llvm/llvm-project/pull/102185
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-29 Thread Pavel Labath via lldb-commits


@@ -96,16 +98,119 @@ def getArchSpec(self, architecture):
 """
 return ["ARCH=" + architecture] if architecture else []
 
-def getCCSpec(self, compiler):
+def getToolchainSpec(self, compiler):
 """
-Helper function to return the key-value string to specify the compiler
+Helper function to return the key-value strings to specify the 
toolchain
 used for the make system.
 """
 cc = compiler if compiler else None
 if not cc and configuration.compiler:
 cc = configuration.compiler
+
 if cc:
-return ['CC="%s"' % cc]
+exe_ext = ""
+if lldbplatformutil.getHostPlatform() == "windows":
+exe_ext = ".exe"
+
+cc = cc.strip()
+cc_path = pathlib.Path(cc)
+
+# We can get CC compiler string in the following formats:
+#  [] - such as 'xrun clang', 'xrun 
/usr/bin/clang' & etc
+#
+# Where  could contain the following parts:
+#   [.]   - sucn as 
'clang', 'clang.exe' ('clang-cl.exe'?)
+#   -[.]   - such as 
'armv7-linux-gnueabi-gcc'
+#   /[.]- such as 
'/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+#   /-[.]- such as 
'/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+
+cc_ext = cc_path.suffix
+# Compiler name without extension
+cc_name = cc_path.stem.split(" ")[-1]
+
+# A kind of compiler (canonical name): clang, gcc, cc & etc.
+cc_type = cc_name
+# A triple prefix of compiler name: gcc
+cc_prefix = ""
+if not "clang-cl" in cc_name and not "llvm-gcc" in cc_name:
+cc_name_parts = cc_name.split("-")
+cc_type = cc_name_parts[-1]
+if len(cc_name_parts) > 1:
+cc_prefix = "-".join(cc_name_parts[:-1]) + "-"
+
+# A kind of C++ compiler.
+cxx_types = {
+"icc": "icpc",
+"llvm-gcc": "llvm-g++",
+"gcc": "g++",
+"cc": "c++",
+"clang": "clang++",
+}
+cxx_type = cxx_types.get(cc_type, cc_type)
+
+cc_dir = cc_path.parent
+
+def getLlvmUtil(util_name):
+llvm_tools_dir = os.getenv("LLVM_TOOLS_DIR", cc_dir)
+return os.path.join(llvm_tools_dir, util_name + exe_ext)
+
+def getToolchainUtil(util_name):
+return cc_dir / (cc_prefix + util_name + cc_ext)
+
+cxx = getToolchainUtil(cxx_type)
+
+util_names = {
+"OBJCOPY": "objcopy",
+"STRIP": "objcopy",
+"ARCHIVER": "ar",
+"DWP": "dwp",
+}
+utils = []
+
+# Note: LLVM_AR is currently required by API TestBSDArchives.py 
tests.
+# Assembly a full path to llvm-ar for given LLVM_TOOLS_DIR;

labath wrote:

```suggestion
# Assemble a full path to llvm-ar for given LLVM_TOOLS_DIR;
```

https://github.com/llvm/llvm-project/pull/102185
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-29 Thread Pavel Labath via lldb-commits

https://github.com/labath edited 
https://github.com/llvm/llvm-project/pull/102185
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-29 Thread Pavel Labath via lldb-commits


@@ -96,16 +98,120 @@ def getArchSpec(self, architecture):
 """
 return ["ARCH=" + architecture] if architecture else []
 
-def getCCSpec(self, compiler):
+def getToolchainSpec(self, compiler):
 """
-Helper function to return the key-value string to specify the compiler
+Helper function to return the key-value strings to specify the 
toolchain
 used for the make system.
 """
 cc = compiler if compiler else None
 if not cc and configuration.compiler:
 cc = configuration.compiler
+
 if cc:
-return ['CC="%s"' % cc]
+exe_ext = ""
+if lldbplatformutil.getHostPlatform() == "windows":
+exe_ext = ".exe"
+
+cc = cc.strip()
+cc_path = pathlib.Path(cc)
+cc = cc_path.as_posix()
+
+# We can get CC compiler string in the following formats:
+#  [] - such as 'xrun clang', 'xrun 
/usr/bin/clang' & etc
+#
+# Where  could contain the following parts:
+#   [.]   - sucn as 
'clang', 'clang.exe' ('clamg-cl.exe'?)
+#   -[.]   - such as 
'armv7-linux-gnueabi-gcc'
+#   /[.]- such as 
'/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+#   /-[.]- such as 
'/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+
+cc_ext = cc_path.suffix
+# Compiler name without extension
+cc_name = cc_path.stem.split(" ")[-1]
+
+# A kind of compiler (canonical name): clang, gcc, cc & etc.
+cc_type = cc_name
+# A triple prefix of compiler name: gcc
+cc_prefix = ""
+if not "clang-cl" in cc_name and not "llvm-gcc" in cc_name:
+cc_name_parts = cc_name.split("-")
+cc_type = cc_name_parts[-1]
+if len(cc_name_parts) > 1:
+cc_prefix = "-".join(cc_name_parts[:-1]) + "-"
+
+# A kind of C++ compiler.
+cxx_types = {
+"icc": "icpc",
+"llvm-gcc": "llvm-g++",
+"gcc": "g++",
+"cc": "c++",
+"clang": "clang++",
+}
+cxx_type = cxx_types.get(cc_type, cc_type)
+
+cc_dir = cc_path.parent
+
+def getLlvmUtil(util_name):
+llvm_tools_dir = os.getenv("LLVM_TOOLS_DIR", cc_dir.as_posix())
+return llvm_tools_dir + "/" + util_name + exe_ext
+
+def getToolchainUtil(util_name):
+return (cc_dir / (cc_prefix + util_name + cc_ext)).as_posix()
+
+cxx = getToolchainUtil(cxx_type)
+
+util_names = {
+"OBJCOPY": "objcopy",
+"STRIP": "objcopy",
+"ARCHIVER": "ar",
+"DWP": "dwp",
+}
+utils = []
+
+# Note: LLVM_AR is currently required by API TestBSDArchives.py 
tests.
+# Assembly a full path to llvm-ar for given LLVM_TOOLS_DIR;
+# otherwise assume we have llvm-ar at the same place where is CC 
specified.
+if not os.getenv("LLVM_AR"):
+llvm_ar = getLlvmUtil("llvm-ar")
+utils.extend(['LLVM_AR="%s"' % llvm_ar])
+
+if not lldbplatformutil.platformIsDarwin():
+if os.getenv("USE_LLVM_TOOLS"):

labath wrote:

I'm still puzzled by this..

https://github.com/llvm/llvm-project/pull/102185
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Better matching of types in anonymous namespaces (PR #102111)

2024-08-29 Thread Pavel Labath via lldb-commits

labath wrote:

Any more thoughts on this, @clayborg ?

https://github.com/llvm/llvm-project/pull/102111
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-29 Thread Pavel Labath via lldb-commits

labath wrote:

The linux read fix is in #106532.

https://github.com/llvm/llvm-project/pull/104193
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb/linux] Make truncated reads work (PR #106532)

2024-08-29 Thread Pavel Labath via lldb-commits

https://github.com/labath updated 
https://github.com/llvm/llvm-project/pull/106532

>From 610757ced98139d3d10451dcca195692b0c2d832 Mon Sep 17 00:00:00 2001
From: Pavel Labath 
Date: Thu, 29 Aug 2024 12:26:01 +0200
Subject: [PATCH 1/2] [lldb/linux] Make truncated reads work

Previously, we were returning an error if we couldn't read the whole
region. This doesn't matter most of the time, because lldb caches memory
reads, and in that process it aligns them to cache line boundaries. As
(LLDB) cache lines are smaller than pages, the reads are unlikely to
cross page boundaries.

Nonetheless, this can cause a problem for large reads (which bypass the
cache), where we're unable to read anything even if just a single byte
of the memory is unreadable. This patch fixes the lldb-server to do
that, and also changes the linux implementation, to reuse any partial
results it got from the process_vm_readv call (to avoid having to
re-read everything again using ptrace, only to find that it stopped at
the same place).

This matches debugserver behavior. It is also consistent with the gdb
remote protocol documentation, but -- notably -- not with actual
gdbserver behavior (which returns errors instead of partial results). We
filed a
[clarification bug](https://sourceware.org/bugzilla/show_bug.cgi?id=24751)
several years ago. Though we did not really reach a conclusion there, I
think this is the most logical behavior.

The associated test does not currently pass on windows, because the
windows memory read APIs don't support partial reads (I have a WIP patch
to work around that).
---
 .../Process/Linux/NativeProcessLinux.cpp  | 41 -
 .../GDBRemoteCommunicationServerLLGS.cpp  | 20 +
 .../API/functionalities/memory/holes/Makefile |  3 +
 .../memory/holes/TestMemoryHoles.py   | 63 ++
 .../API/functionalities/memory/holes/main.cpp | 87 +++
 5 files changed, 176 insertions(+), 38 deletions(-)
 create mode 100644 lldb/test/API/functionalities/memory/holes/Makefile
 create mode 100644 
lldb/test/API/functionalities/memory/holes/TestMemoryHoles.py
 create mode 100644 lldb/test/API/functionalities/memory/holes/main.cpp

diff --git a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp 
b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
index cb95da9fc72363..1e2e3a80b18bf6 100644
--- a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
+++ b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
@@ -1641,6 +1641,10 @@ 
NativeProcessLinux::GetSoftwareBreakpointTrapOpcode(size_t size_hint) {
 
 Status NativeProcessLinux::ReadMemory(lldb::addr_t addr, void *buf, size_t 
size,
   size_t &bytes_read) {
+  Log *log = GetLog(POSIXLog::Memory);
+  LLDB_LOG(log, "addr = {0}, buf = {1}, size = {2}", addr, buf, size);
+
+  bytes_read = 0;
   if (ProcessVmReadvSupported()) {
 // The process_vm_readv path is about 50 times faster than ptrace api. We
 // want to use this syscall if it is supported.
@@ -1651,32 +1655,29 @@ Status NativeProcessLinux::ReadMemory(lldb::addr_t 
addr, void *buf, size_t size,
 remote_iov.iov_base = reinterpret_cast(addr);
 remote_iov.iov_len = size;
 
-bytes_read = process_vm_readv(GetCurrentThreadID(), &local_iov, 1,
-  &remote_iov, 1, 0);
-const bool success = bytes_read == size;
+ssize_t read_result = process_vm_readv(GetCurrentThreadID(), &local_iov, 1,
+   &remote_iov, 1, 0);
+int error = 0;
+if (read_result < 0)
+  error = errno;
+else
+  bytes_read = read_result;
 
-Log *log = GetLog(POSIXLog::Process);
 LLDB_LOG(log,
- "using process_vm_readv to read {0} bytes from inferior "
- "address {1:x}: {2}",
- size, addr, success ? "Success" : llvm::sys::StrError(errno));
-
-if (success)
-  return Status();
-// else the call failed for some reason, let's retry the read using ptrace
-// api.
+ "process_vm_readv({0}, [iovec({1}, {2})], [iovec({3:x}, {2})], 1, 
"
+ "0) => {4} ({5})",
+ GetCurrentThreadID(), buf, size, addr, read_result,
+ error > 0 ? llvm::sys::StrError(errno) : "sucesss");
   }
 
   unsigned char *dst = static_cast(buf);
   size_t remainder;
   long data;
 
-  Log *log = GetLog(POSIXLog::Memory);
-  LLDB_LOG(log, "addr = {0}, buf = {1}, size = {2}", addr, buf, size);
-
-  for (bytes_read = 0; bytes_read < size; bytes_read += remainder) {
+  for (; bytes_read < size; bytes_read += remainder) {
 Status error = NativeProcessLinux::PtraceWrapper(
-PTRACE_PEEKDATA, GetCurrentThreadID(), (void *)addr, nullptr, 0, 
&data);
+PTRACE_PEEKDATA, GetCurrentThreadID(),
+reinterpret_cast(addr + bytes_read), nullptr, 0, &data);
 if (error.Fail())
   return error;
 
@@ -1684,11 +1685,7 @@ Status NativeProcessLinux::ReadMemory(lldb::addr_t addr, 
void *buf, 

[Lldb-commits] [lldb] [lldb/linux] Make truncated reads work (PR #106532)

2024-08-29 Thread Pavel Labath via lldb-commits


@@ -2526,28 +2526,16 @@ GDBRemoteCommunicationServerLLGS::Handle_memory_read(
   size_t bytes_read = 0;
   Status error = m_current_process->ReadMemoryWithoutTrap(
   read_addr, &buf[0], byte_count, bytes_read);
-  if (error.Fail()) {
-LLDB_LOGF(log,
-  "GDBRemoteCommunicationServerLLGS::%s pid %" PRIu64
-  " mem 0x%" PRIx64 ": failed to read. Error: %s",
-  __FUNCTION__, m_current_process->GetID(), read_addr,
-  error.AsCString());
+  LLDB_LOG(log, "ReadMemoryWithoutTrap({0}) read {1}/{2} bytes (error: {3})",
+   read_addr, byte_count, bytes_read, error);
+  if (bytes_read == 0)

labath wrote:

This was changed to not assume that an error means nothing was read.

https://github.com/llvm/llvm-project/pull/106532
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb/linux] Make truncated reads work (PR #106532)

2024-08-29 Thread Pavel Labath via lldb-commits


@@ -0,0 +1,63 @@
+"""
+Test the memory commands operating on memory regions with holes
+"""
+
+import lldb
+from lldbsuite.test.lldbtest import *
+import lldbsuite.test.lldbutil as lldbutil
+from lldbsuite.test.decorators import *
+
+
+class MemoryHolesTestCase(TestBase):
+NO_DEBUG_INFO_TESTCASE = True
+
+def setUp(self):
+super().setUp()
+# Find the line number to break inside main().
+self.line = line_number("main.cpp", "// break here")
+
+def _prepare_inferior(self):
+self.build()
+exe = self.getBuildArtifact("a.out")
+self.runCmd("file " + exe, CURRENT_EXECUTABLE_SET)
+
+# Break in main() after the variables are assigned values.
+lldbutil.run_break_set_by_file_and_line(
+self, "main.cpp", self.line, num_expected_locations=1, 
loc_exact=True
+)
+
+self.runCmd("run", RUN_SUCCEEDED)
+
+# The stop reason of the thread should be breakpoint.
+self.expect(
+"thread list",
+STOPPED_DUE_TO_BREAKPOINT,
+substrs=["stopped", "stop reason = breakpoint"],
+)
+
+# The breakpoint should have a hit count of 1.
+lldbutil.check_breakpoint(self, bpno=1, expected_hit_count=1)
+
+# Avoid the expression evaluator, as it can allocate allocate memory
+# inside the holes we've deliberately left empty.
+self.memory = 
self.frame().FindVariable("mem_with_holes").GetValueAsUnsigned()
+self.pagesize = 
self.frame().FindVariable("pagesize").GetValueAsUnsigned()
+positions = self.frame().FindVariable("positions")
+self.positions = [
+positions.GetChildAtIndex(i).GetValueAsUnsigned()
+for i in range(positions.GetNumChildren())
+]
+self.assertEqual(len(self.positions), 5)
+
+@expectedFailureWindows
+def test_memory_read(self):
+self._prepare_inferior()
+
+error = lldb.SBError()
+content = self.process().ReadMemory(self.memory, 2 * self.pagesize, 
error)
+self.assertEqual(len(content), self.pagesize)
+self.assertEqual(content[0:7], b"needle\0")
+self.assertTrue(error.Fail())
+self.assertEqual(
+f"memory read failed for {self.memory+self.pagesize:#x}", 
error.GetCString()
+)

labath wrote:

I guess this kind of error behavior can be useful in some situations (you get 
the memory that can be read *and* the error for the part that cannot), but I've 
found it very counterintuitive to use. I'd definitely be open to changing that 
(returning an error only if we didn't read anything).

https://github.com/llvm/llvm-project/pull/106532
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb/linux] Make truncated reads work (PR #106532)

2024-08-29 Thread Pavel Labath via lldb-commits

https://github.com/labath created 
https://github.com/llvm/llvm-project/pull/106532

Previously, we were returning an error if we couldn't read the whole region. 
This doesn't matter most of the time, because lldb caches memory reads, and in 
that process it aligns them to cache line boundaries. As (LLDB) cache lines are 
smaller than pages, the reads are unlikely to cross page boundaries.

Nonetheless, this can cause a problem for large reads (which bypass the cache), 
where we're unable to read anything even if just a single byte of the memory is 
unreadable. This patch fixes the lldb-server to do that, and also changes the 
linux implementation, to reuse any partial results it got from the 
process_vm_readv call (to avoid having to re-read everything again using 
ptrace, only to find that it stopped at the same place).

This matches debugserver behavior. It is also consistent with the gdb remote 
protocol documentation, but -- notably -- not with actual gdbserver behavior 
(which returns errors instead of partial results). We filed a
[clarification bug](https://sourceware.org/bugzilla/show_bug.cgi?id=24751) 
several years ago. Though we did not really reach a conclusion there, I think 
this is the most logical behavior.

The associated test does not currently pass on windows, because the windows 
memory read APIs don't support partial reads (I have a WIP patch to work around 
that).

>From 610757ced98139d3d10451dcca195692b0c2d832 Mon Sep 17 00:00:00 2001
From: Pavel Labath 
Date: Thu, 29 Aug 2024 12:26:01 +0200
Subject: [PATCH] [lldb/linux] Make truncated reads work

Previously, we were returning an error if we couldn't read the whole
region. This doesn't matter most of the time, because lldb caches memory
reads, and in that process it aligns them to cache line boundaries. As
(LLDB) cache lines are smaller than pages, the reads are unlikely to
cross page boundaries.

Nonetheless, this can cause a problem for large reads (which bypass the
cache), where we're unable to read anything even if just a single byte
of the memory is unreadable. This patch fixes the lldb-server to do
that, and also changes the linux implementation, to reuse any partial
results it got from the process_vm_readv call (to avoid having to
re-read everything again using ptrace, only to find that it stopped at
the same place).

This matches debugserver behavior. It is also consistent with the gdb
remote protocol documentation, but -- notably -- not with actual
gdbserver behavior (which returns errors instead of partial results). We
filed a
[clarification bug](https://sourceware.org/bugzilla/show_bug.cgi?id=24751)
several years ago. Though we did not really reach a conclusion there, I
think this is the most logical behavior.

The associated test does not currently pass on windows, because the
windows memory read APIs don't support partial reads (I have a WIP patch
to work around that).
---
 .../Process/Linux/NativeProcessLinux.cpp  | 41 -
 .../GDBRemoteCommunicationServerLLGS.cpp  | 20 +
 .../API/functionalities/memory/holes/Makefile |  3 +
 .../memory/holes/TestMemoryHoles.py   | 63 ++
 .../API/functionalities/memory/holes/main.cpp | 87 +++
 5 files changed, 176 insertions(+), 38 deletions(-)
 create mode 100644 lldb/test/API/functionalities/memory/holes/Makefile
 create mode 100644 
lldb/test/API/functionalities/memory/holes/TestMemoryHoles.py
 create mode 100644 lldb/test/API/functionalities/memory/holes/main.cpp

diff --git a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp 
b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
index cb95da9fc72363..1e2e3a80b18bf6 100644
--- a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
+++ b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
@@ -1641,6 +1641,10 @@ 
NativeProcessLinux::GetSoftwareBreakpointTrapOpcode(size_t size_hint) {
 
 Status NativeProcessLinux::ReadMemory(lldb::addr_t addr, void *buf, size_t 
size,
   size_t &bytes_read) {
+  Log *log = GetLog(POSIXLog::Memory);
+  LLDB_LOG(log, "addr = {0}, buf = {1}, size = {2}", addr, buf, size);
+
+  bytes_read = 0;
   if (ProcessVmReadvSupported()) {
 // The process_vm_readv path is about 50 times faster than ptrace api. We
 // want to use this syscall if it is supported.
@@ -1651,32 +1655,29 @@ Status NativeProcessLinux::ReadMemory(lldb::addr_t 
addr, void *buf, size_t size,
 remote_iov.iov_base = reinterpret_cast(addr);
 remote_iov.iov_len = size;
 
-bytes_read = process_vm_readv(GetCurrentThreadID(), &local_iov, 1,
-  &remote_iov, 1, 0);
-const bool success = bytes_read == size;
+ssize_t read_result = process_vm_readv(GetCurrentThreadID(), &local_iov, 1,
+   &remote_iov, 1, 0);
+int error = 0;
+if (read_result < 0)
+  error = errno;
+else
+  bytes_read = read_result;
 
-Log *log = GetLog(POSIX

[Lldb-commits] [lldb] [lldb] Updated TCPSocket to listen multiple ports on the same single thread (PR #104797)

2024-08-29 Thread Pavel Labath via lldb-commits

labath wrote:

> @labath
> 
> > Having a single socket listen on multiple ports sounds like a bad idea to 
> > me.
> 
> Note that currently the class TCPSocket already contains a list of 
> NativeSocket `m_listen_sockets`.

I am aware of that, and I'm not entirely happy with how the class implements 
listening. I think it would be better to have a separate class for a listening 
socket, because those need a completely different APIs. But, even ignoring 
that, I think there's a difference between listening on two addresses with the 
same port (I believe the only way to reach that state is if a hostname resolves 
to multiple addresses, in which case one really could argue that it's the same 
"logical" address), and two addresses with completely unrelated ports.

> We do not need 2 TCPSocket instances with 2 separated lists of native sockets 
> even with a common MainLoop.
> 
> We have 2 options:
> 
> * A) 2 threads - first one calls TCPSocket::Accept() for the platform 
> connection, second calls TCPSocket::Accept() for the gdb connection
> 
> * B) 1 thread - a common TCPSocket::Accept() can accept platform and gdb 
> connections from both ports

We have (at least) three options, the third one being what I outlined in the 
previous message. I'm sorry this work of yours is going to waste. I could've 
been more explicit about what I wanted to do on the first PR, but I was more 
focused on what I don't want to do.

> 
> 
> I have implemented the option B. It was easy because the class TCPSocket 
> already contains `m_listen_sockets`.
> 
> > Then we could create two sockets and have them listen on the same main loop 
> > instance.
> 
> It is already implemented inside TCPSocket but for different addresses. 

That's true, but I draw a different conclusion from that. Instead of saying 
"this should be extended to support multiple ports", my take is that "this 
wasn't a good place for multiplexing to begin with".

> I just added different ports.
> 
> The changes are minimal really. 

Yes, but that's because you extended a stringly typed api to do that you 
wanted. That function is used from other places as well, and this means that 
other places in lldb (including user-facing code, I believe) can accept these 
multi-port connection strings. I don't think we want to support that.

https://github.com/llvm/llvm-project/pull/104797
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-29 Thread Pavel Labath via lldb-commits

https://github.com/labath updated 
https://github.com/llvm/llvm-project/pull/104193

>From 7b8b8b699902d2365ea43e5ae015546c4d20fac8 Mon Sep 17 00:00:00 2001
From: Pavel Labath 
Date: Wed, 14 Aug 2024 19:58:27 +0200
Subject: [PATCH 1/5] [lldb] Fix and speedup the `memory find` command

This patch fixes an issue where the `memory find` command would
effectively stop searching after encountering a memory read error (which
could happen due to unreadable memory), without giving any indication
that it has done so (it would just print it could not find the pattern).

To make matters worse, it would not terminate after encountering this
error, but rather proceed to slowly increment the address pointer, which
meant that searching a large region could take a very long time (and
give the appearance that lldb is actually searching for the thing).

The patch fixes this first problem (*) by detecting read errors and
skipping over (using GetMemoryRegionInfo) the unreadable parts of memory
and resuming the search after them. It also reads the memory in bulk (up
to 1MB), which speeds up the search significantly (up to 6x for live
processes, 18x for core files).

(*) The fix does not work on windows yet, because the ReadMemory API
does not return partial results (like it does for other systems). I'm
preparing a separate patch to deal with that.
---
 lldb/source/Target/Process.cpp| 68 +++
 .../memory/find/TestMemoryFind.py | 30 ++-
 .../API/functionalities/memory/find/main.cpp  | 83 +--
 3 files changed, 136 insertions(+), 45 deletions(-)

diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index b2a0f13b9a1549..5d066d264bd3f5 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -114,33 +114,6 @@ class ProcessOptionValueProperties
   }
 };
 
-class ProcessMemoryIterator {
-public:
-  ProcessMemoryIterator(Process &process, lldb::addr_t base)
-  : m_process(process), m_base_addr(base) {}
-
-  bool IsValid() { return m_is_valid; }
-
-  uint8_t operator[](lldb::addr_t offset) {
-if (!IsValid())
-  return 0;
-
-uint8_t retval = 0;
-Status error;
-if (0 == m_process.ReadMemory(m_base_addr + offset, &retval, 1, error)) {
-  m_is_valid = false;
-  return 0;
-}
-
-return retval;
-  }
-
-private:
-  Process &m_process;
-  const lldb::addr_t m_base_addr;
-  bool m_is_valid = true;
-};
-
 static constexpr OptionEnumValueElement g_follow_fork_mode_values[] = {
 {
 eFollowParent,
@@ -3367,21 +3340,48 @@ lldb::addr_t Process::FindInMemory(lldb::addr_t low, 
lldb::addr_t high,
   if (region_size < size)
 return LLDB_INVALID_ADDRESS;
 
+  // See "Boyer-Moore string search algorithm".
   std::vector bad_char_heuristic(256, size);
-  ProcessMemoryIterator iterator(*this, low);
-
   for (size_t idx = 0; idx < size - 1; idx++) {
 decltype(bad_char_heuristic)::size_type bcu_idx = buf[idx];
 bad_char_heuristic[bcu_idx] = size - idx - 1;
   }
-  for (size_t s = 0; s <= (region_size - size);) {
+
+  // Memory we're currently searching through.
+  llvm::SmallVector mem;
+  // Position of the memory buffer.
+  addr_t mem_pos = low;
+  // Maximum number of bytes read (and buffered). We need to read at least
+  // `size` bytes for a successful match.
+  const size_t max_read_size = std::max(size, 0x1);
+
+  for (addr_t s = low; s <= (high - size);) {
+if (s + size > mem.size() + mem_pos) {
+  // We need to read more data. We don't attempt to reuse the data we've
+  // already read (up to `size-1` bytes from `s` to `mem_pos+mem.size()`).
+  // This is fine for patterns much smaller than max_read_size. For very
+  // long patterns we may need to do something more elaborate.
+  mem.resize_for_overwrite(max_read_size);
+  Status error;
+  mem.resize(
+  ReadMemory(s, mem.data(), std::min(mem.size(), high - s), error));
+  mem_pos = s;
+  if (error.Fail() || size > mem.size()) {
+// We didn't read enough data. Skip to the next memory region.
+MemoryRegionInfo info;
+error = GetMemoryRegionInfo(mem_pos + mem.size(), info);
+if (error.Fail())
+  break;
+s = info.GetRange().GetRangeEnd();
+continue;
+  }
+}
 int64_t j = size - 1;
-while (j >= 0 && buf[j] == iterator[s + j])
+while (j >= 0 && buf[j] == mem[s + j - mem_pos])
   j--;
 if (j < 0)
-  return low + s;
-else
-  s += bad_char_heuristic[iterator[s + size - 1]];
+  return s; // We have a match.
+s += bad_char_heuristic[mem[s + size - 1 - mem_pos]];
   }
 
   return LLDB_INVALID_ADDRESS;
diff --git a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py 
b/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
index 09611cc808777d..72acfb3d600701 100644
--- a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
+++ b/lldb/test/API/functionalities/memory/fin

[Lldb-commits] [lldb] lldb: get lldb API tests working with newer Android NDKs (PR #106443)

2024-08-29 Thread Pavel Labath via lldb-commits


@@ -1,81 +1,55 @@
 NDK_ROOT := $(shell dirname $(CC))/../../../../..
 
-ifeq "$(findstring 64, $(ARCH))" "64"
-   # lowest 64-bit API level
-   API_LEVEL := 21
-else ifeq "$(ARCH)" "i386"
-   # clone(2) declaration is present only since this api level
-   API_LEVEL := 17
+ifeq "$(HOST_OS)" "Linux"
+   HOST_TAG := linux-x86_64
+else ifeq "$(HOST_OS)" "Darwin"
+   HOST_TAG := darwin-x86_64
 else
-   # lowest supported 32-bit API level
-   API_LEVEL := 16
+   HOST_TAG := windows-x86_64
+endif
+
+TOOLCHAIN_SYSROOT := $(NDK_ROOT)/toolchains/llvm/prebuilt/$(HOST_TAG)/sysroot
+
+ifeq "$(wildcard $(TOOLCHAIN_SYSROOT)/.)" ""
+# Compiling test inferiors for Android requires an NDK with the unified
+# toolchain introduced in version r19.
+$(error "No unified toolchain sysroot found in $(NDK_ROOT). NDK must be r19 or 
later.")
 endif
 
 ifeq "$(ARCH)" "arm"
-   SYSROOT_ARCH := arm
-   STL_ARCH := armeabi-v7a
TRIPLE := armv7-none-linux-androideabi
ARCH_CFLAGS += -march=armv7-a -mfloat-abi=softfp -mfpu=vfpv3-d16 -marm
 else ifeq "$(ARCH)" "aarch64"
-   SYSROOT_ARCH := arm64
-   STL_ARCH := arm64-v8a
TRIPLE := aarch64-none-linux-android
 else ifeq "$(ARCH)" "i386"
-   SYSROOT_ARCH := x86
-   STL_ARCH := x86
TRIPLE := i686-none-linux-android
 else
-   SYSROOT_ARCH := $(ARCH)
-   STL_ARCH := $(ARCH)
TRIPLE := $(ARCH)-none-linux-android
 endif
 
-ifeq "$(findstring 86,$(ARCH))" "86"
-   TOOLCHAIN_DIR := $(STL_ARCH)-4.9
-else ifeq "$(ARCH)" "arm"
-   TOOLCHAIN_DIR := arm-linux-androideabi-4.9
-else
-   TOOLCHAIN_DIR := $(subst -none,,$(TRIPLE))-4.9
-endif
+# lowest 64-bit API level
+API_LEVEL := 21
 
 ifeq "$(ARCH)" "arm"
-   TOOL_PREFIX := arm-linux-androideabi
-else
-   TOOL_PREFIX := $(subst -none,,$(TRIPLE))
-endif
-
-ifeq "$(HOST_OS)" "Linux"
-   HOST_TAG := linux-x86_64
-else ifeq "$(HOST_OS)" "Darwin"
-   HOST_TAG := darwin-x86_64
+   ARCH_DIR := arm-linux-androideabi
 else
-   HOST_TAG := windows-x86_64
-endif
-
-GCC_TOOLCHAIN = $(NDK_ROOT)/toolchains/$(TOOLCHAIN_DIR)/prebuilt/$(HOST_TAG)
-
-OBJCOPY ?= $(GCC_TOOLCHAIN)/bin/$(TOOL_PREFIX)-objcopy
-ARCHIVER ?= $(GCC_TOOLCHAIN)/bin/$(TOOL_PREFIX)-ar
-
-ifeq "$(findstring clang,$(CC))" "clang"
-   ARCH_CFLAGS += -target $(TRIPLE) --gcc-toolchain=$(GCC_TOOLCHAIN)
-   ARCH_LDFLAGS += -target $(TRIPLE) --gcc-toolchain=$(GCC_TOOLCHAIN)
+   ARCH_DIR := $(subst -none,,$(TRIPLE))
 endif
 
-ARCH_CFLAGS += --sysroot=$(NDK_ROOT)/sysroot \
-   -isystem $(NDK_ROOT)/sysroot/usr/include/$(TOOL_PREFIX) \
-   -D__ANDROID_API__=$(API_LEVEL) \
-   -isystem 
$(NDK_ROOT)/platforms/android-$(API_LEVEL)/arch-$(SYSROOT_ARCH)/usr/include
-
-ARCH_LDFLAGS += 
--sysroot=$(NDK_ROOT)/platforms/android-$(API_LEVEL)/arch-$(SYSROOT_ARCH) -lm
+ARCH_CFLAGS += \
+   --target=$(TRIPLE) \
+   --sysroot=$(TOOLCHAIN_SYSROOT) \
+   -D __ANDROID_API__=$(API_LEVEL) \
 
 ARCH_CXXFLAGS += \
-   -isystem $(NDK_ROOT)/sources/cxx-stl/llvm-libc++/include \
-   -isystem $(NDK_ROOT)/sources/android/support/include \
-   -isystem $(NDK_ROOT)/sources/cxx-stl/llvm-libc++abi/include
+   -isystem $(TOOLCHAIN_SYSROOT)/usr/include/c++/v1 \
 
 ARCH_LDFLAGS += \
-   -L$(NDK_ROOT)/sources/cxx-stl/llvm-libc++/libs/$(STL_ARCH) \
-   
$(NDK_ROOT)/sources/cxx-stl/llvm-libc++/libs/$(STL_ARCH)/libc++_static.a \
+   --target=$(TRIPLE) \
+   --sysroot=$(TOOLCHAIN_SYSROOT) \
+   --prefix=$(TOOLCHAIN_SYSROOT)/usr/lib/$(ARCH_DIR)/$(API_LEVEL) \
+   
--library-directory=$(TOOLCHAIN_SYSROOT)/usr/lib/$(ARCH_DIR)/$(API_LEVEL) \

labath wrote:

```suggestion
-L$(TOOLCHAIN_SYSROOT)/usr/lib/$(ARCH_DIR)/$(API_LEVEL) \
```

I haven't seen anyone use the long version of this option (I had to look it up 
to be sure of what it does), so I think it'd be better to stick to `-L`. 
`--prefix` is probably ok, since its short form (`-B`) is also not very common.

https://github.com/llvm/llvm-project/pull/106443
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] lldb: get lldb API tests working with newer Android NDKs (PR #106443)

2024-08-29 Thread Pavel Labath via lldb-commits


@@ -1,81 +1,55 @@
 NDK_ROOT := $(shell dirname $(CC))/../../../../..
 
-ifeq "$(findstring 64, $(ARCH))" "64"
-   # lowest 64-bit API level
-   API_LEVEL := 21
-else ifeq "$(ARCH)" "i386"
-   # clone(2) declaration is present only since this api level
-   API_LEVEL := 17
+ifeq "$(HOST_OS)" "Linux"
+   HOST_TAG := linux-x86_64
+else ifeq "$(HOST_OS)" "Darwin"
+   HOST_TAG := darwin-x86_64
 else
-   # lowest supported 32-bit API level
-   API_LEVEL := 16
+   HOST_TAG := windows-x86_64
+endif
+
+TOOLCHAIN_SYSROOT := $(NDK_ROOT)/toolchains/llvm/prebuilt/$(HOST_TAG)/sysroot
+
+ifeq "$(wildcard $(TOOLCHAIN_SYSROOT)/.)" ""
+# Compiling test inferiors for Android requires an NDK with the unified
+# toolchain introduced in version r19.
+$(error "No unified toolchain sysroot found in $(NDK_ROOT). NDK must be r19 or 
later.")
 endif
 
 ifeq "$(ARCH)" "arm"
-   SYSROOT_ARCH := arm
-   STL_ARCH := armeabi-v7a
TRIPLE := armv7-none-linux-androideabi
ARCH_CFLAGS += -march=armv7-a -mfloat-abi=softfp -mfpu=vfpv3-d16 -marm
 else ifeq "$(ARCH)" "aarch64"
-   SYSROOT_ARCH := arm64
-   STL_ARCH := arm64-v8a
TRIPLE := aarch64-none-linux-android
 else ifeq "$(ARCH)" "i386"
-   SYSROOT_ARCH := x86
-   STL_ARCH := x86
TRIPLE := i686-none-linux-android
 else
-   SYSROOT_ARCH := $(ARCH)
-   STL_ARCH := $(ARCH)
TRIPLE := $(ARCH)-none-linux-android
 endif
 
-ifeq "$(findstring 86,$(ARCH))" "86"
-   TOOLCHAIN_DIR := $(STL_ARCH)-4.9
-else ifeq "$(ARCH)" "arm"
-   TOOLCHAIN_DIR := arm-linux-androideabi-4.9
-else
-   TOOLCHAIN_DIR := $(subst -none,,$(TRIPLE))-4.9
-endif
+# lowest 64-bit API level
+API_LEVEL := 21
 
 ifeq "$(ARCH)" "arm"
-   TOOL_PREFIX := arm-linux-androideabi
-else
-   TOOL_PREFIX := $(subst -none,,$(TRIPLE))
-endif
-
-ifeq "$(HOST_OS)" "Linux"
-   HOST_TAG := linux-x86_64
-else ifeq "$(HOST_OS)" "Darwin"
-   HOST_TAG := darwin-x86_64
+   ARCH_DIR := arm-linux-androideabi
 else
-   HOST_TAG := windows-x86_64
-endif
-
-GCC_TOOLCHAIN = $(NDK_ROOT)/toolchains/$(TOOLCHAIN_DIR)/prebuilt/$(HOST_TAG)
-
-OBJCOPY ?= $(GCC_TOOLCHAIN)/bin/$(TOOL_PREFIX)-objcopy
-ARCHIVER ?= $(GCC_TOOLCHAIN)/bin/$(TOOL_PREFIX)-ar
-
-ifeq "$(findstring clang,$(CC))" "clang"
-   ARCH_CFLAGS += -target $(TRIPLE) --gcc-toolchain=$(GCC_TOOLCHAIN)
-   ARCH_LDFLAGS += -target $(TRIPLE) --gcc-toolchain=$(GCC_TOOLCHAIN)
+   ARCH_DIR := $(subst -none,,$(TRIPLE))
 endif
 
-ARCH_CFLAGS += --sysroot=$(NDK_ROOT)/sysroot \
-   -isystem $(NDK_ROOT)/sysroot/usr/include/$(TOOL_PREFIX) \
-   -D__ANDROID_API__=$(API_LEVEL) \
-   -isystem 
$(NDK_ROOT)/platforms/android-$(API_LEVEL)/arch-$(SYSROOT_ARCH)/usr/include
-
-ARCH_LDFLAGS += 
--sysroot=$(NDK_ROOT)/platforms/android-$(API_LEVEL)/arch-$(SYSROOT_ARCH) -lm
+ARCH_CFLAGS += \
+   --target=$(TRIPLE) \
+   --sysroot=$(TOOLCHAIN_SYSROOT) \
+   -D __ANDROID_API__=$(API_LEVEL) \
 
 ARCH_CXXFLAGS += \
-   -isystem $(NDK_ROOT)/sources/cxx-stl/llvm-libc++/include \
-   -isystem $(NDK_ROOT)/sources/android/support/include \
-   -isystem $(NDK_ROOT)/sources/cxx-stl/llvm-libc++abi/include
+   -isystem $(TOOLCHAIN_SYSROOT)/usr/include/c++/v1 \
 
 ARCH_LDFLAGS += \
-   -L$(NDK_ROOT)/sources/cxx-stl/llvm-libc++/libs/$(STL_ARCH) \
-   
$(NDK_ROOT)/sources/cxx-stl/llvm-libc++/libs/$(STL_ARCH)/libc++_static.a \
+   --target=$(TRIPLE) \
+   --sysroot=$(TOOLCHAIN_SYSROOT) \
+   --prefix=$(TOOLCHAIN_SYSROOT)/usr/lib/$(ARCH_DIR)/$(API_LEVEL) \
+   
--library-directory=$(TOOLCHAIN_SYSROOT)/usr/lib/$(ARCH_DIR)/$(API_LEVEL) \
+   $(TOOLCHAIN_SYSROOT)/usr/lib/$(ARCH_DIR)/libc++_static.a \
+   -lm \
-lc++abi \
-   -nostdlib++
+   -nostdlib++ \

labath wrote:

I get the diff benefits of this, but a backslash continuation at the end of a 
file looks very suspicious. I'd just remove all of the final backslashes you 
have here.

https://github.com/llvm/llvm-project/pull/106443
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] lldb: get lldb API tests working with newer Android NDKs (PR #106443)

2024-08-29 Thread Pavel Labath via lldb-commits

https://github.com/labath edited 
https://github.com/llvm/llvm-project/pull/106443
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] lldb: get lldb API tests working with newer Android NDKs (PR #106443)

2024-08-29 Thread Pavel Labath via lldb-commits

https://github.com/labath commented:

This looks great. I don't think anyone will object to removing old ndk support, 
but let's leave this PR open for a while to give people a chance to notice.

https://github.com/llvm/llvm-project/pull/106443
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Add Status::Detail to store expression evaluator diagnostics [… (PR #106442)

2024-08-29 Thread Pavel Labath via lldb-commits

labath wrote:

The llvm::Error class already supports this kind of metadata (and in a much 
better way IMO). Given that moving to llvm::Error is our long term goal, would 
it be possible to port enough APIs so that it's possible to pass an unmodified 
Error object from the producer to the place that needs to consume it?

https://github.com/llvm/llvm-project/pull/106442
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Inline expression evaluator error visualization (PR #106470)

2024-08-29 Thread Pavel Labath via lldb-commits

labath wrote:

This seems like it could be problematic for IDEs, if they don't print the error 
in the same window as the expression being evaluated. The arrows could end up 
pointing to nowhere, or to the wrong place in the expression (if we don't get 
the right offset).

I think this could be made simpler and more robust by just printing a 
reproduction of the expression as the first line of the error message. It 
saving that one line output worth it?

Also, how will this behave for multiline expressions? Or with errors that refer 
to multiple source locations (e.g inside macro expansions)?
```
(lldb) expr
Enter expressions, then terminate with an empty line to evaluate:
  1: #define FOO(x) foo+x 
  2: FOO(bar) 
error: :2:1: use of undeclared identifier 'foo'
2 | FOO(bar)
  | ^
:1:16: expanded from macro 'FOO'
1 | #define FOO(x) foo+x
  |^
error: :2:5: use of undeclared identifier 'bar'
2 | FOO(bar)
  | ^
```

https://github.com/llvm/llvm-project/pull/106470
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-28 Thread Pavel Labath via lldb-commits

labath wrote:

I figured out the problem, and it's kind of hilarious. Turns out (linux?) 
lldb-server does not actually return partial reads (i.e., it behaves like 
windows), but the reason that the test succeeds (for me) is that the holes in 
the allocated memory get filled my memory allocated by lldb. For whatever 
reason (different kernel) this does not happen on the bot. I guess I'll need to 
fix the memory read issues first..

https://github.com/llvm/llvm-project/pull/104193
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-28 Thread Pavel Labath via lldb-commits


@@ -3367,21 +3340,48 @@ lldb::addr_t Process::FindInMemory(lldb::addr_t low, 
lldb::addr_t high,
   if (region_size < size)
 return LLDB_INVALID_ADDRESS;
 
+  // See "Boyer-Moore string search algorithm".
   std::vector bad_char_heuristic(256, size);
-  ProcessMemoryIterator iterator(*this, low);
-
   for (size_t idx = 0; idx < size - 1; idx++) {
 decltype(bad_char_heuristic)::size_type bcu_idx = buf[idx];
 bad_char_heuristic[bcu_idx] = size - idx - 1;
   }
-  for (size_t s = 0; s <= (region_size - size);) {
+
+  // Memory we're currently searching through.
+  llvm::SmallVector mem;
+  // Position of the memory buffer.
+  addr_t mem_pos = low;
+  // Maximum number of bytes read (and buffered). We need to read at least
+  // `size` bytes for a successful match.
+  const size_t max_read_size = std::max(size, 0x1);
+
+  for (addr_t s = low; s <= (high - size);) {
+if (s + size > mem.size() + mem_pos) {
+  // We need to read more data. We don't attempt to reuse the data we've
+  // already read (up to `size-1` bytes from `s` to `mem_pos+mem.size()`).
+  // This is fine for patterns much smaller than max_read_size. For very
+  // long patterns we may need to do something more elaborate.
+  mem.resize_for_overwrite(max_read_size);
+  Status error;
+  mem.resize(
+  ReadMemory(s, mem.data(), std::min(mem.size(), high - s), error));
+  mem_pos = s;
+  if (error.Fail() || size > mem.size()) {
+// We didn't read enough data. Skip to the next memory region.
+MemoryRegionInfo info;
+error = GetMemoryRegionInfo(mem_pos + mem.size(), info);
+if (error.Fail())
+  break;
+s = info.GetRange().GetRangeEnd();

labath wrote:

Correct. mem_pos is updated when changing the `mem` array, which we're not 
doing here. That will happen on the next iteration of the loop when we realize 
that the mem buffer does not hold the thing we're interested in.

https://github.com/llvm/llvm-project/pull/104193
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-28 Thread Pavel Labath via lldb-commits


@@ -1,9 +1,76 @@
-#include 
-#include 
-
-int main (int argc, char const *argv[])
-{
-const char* stringdata = "hello world; I like to write text in const char 
pointers";
-uint8_t bytedata[] = 
{0xAA,0xBB,0xCC,0xDD,0xEE,0xFF,0x00,0x11,0x22,0x33,0x44,0x55,0x66,0x77,0x88,0x99};
-return 0; // break here
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef _WIN32
+#include "Windows.h"
+
+int getpagesize() {
+  SYSTEM_INFO system_info;
+  GetSystemInfo(&system_info);
+  return system_info.dwPageSize;
+}
+
+char *allocate_memory_with_holes() {
+  int pagesize = getpagesize();
+  void *mem = VirtualAlloc(nullptr, 5 * pagesize, MEM_RESERVE, PAGE_NOACCESS);
+  if (!mem) {
+std::cerr << std::system_category().message(GetLastError()) << std::endl;
+exit(1);
+  }
+  char *bytes = static_cast(mem);
+  for (int page : {0, 2, 4}) {
+if (!VirtualAlloc(bytes + page * pagesize, pagesize, MEM_COMMIT,

labath wrote:

Added a comment at the top. Definitely not a Windows expert here, but my 
understanding is that the main difference is due to the MEM_RESERVE/MEM_COMMIT. 
MEM_RESERVE does not actually allocate any memory. It just marks the range as 
in use ("reserved") so that it is won't be taken by other allocation calls. 
MEM_COMMIT is what actually allocates the page.

https://github.com/llvm/llvm-project/pull/104193
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-28 Thread Pavel Labath via lldb-commits


@@ -3367,21 +3340,48 @@ lldb::addr_t Process::FindInMemory(lldb::addr_t low, 
lldb::addr_t high,
   if (region_size < size)
 return LLDB_INVALID_ADDRESS;
 
+  // See "Boyer-Moore string search algorithm".
   std::vector bad_char_heuristic(256, size);
-  ProcessMemoryIterator iterator(*this, low);
-
   for (size_t idx = 0; idx < size - 1; idx++) {
 decltype(bad_char_heuristic)::size_type bcu_idx = buf[idx];
 bad_char_heuristic[bcu_idx] = size - idx - 1;
   }
-  for (size_t s = 0; s <= (region_size - size);) {
+
+  // Memory we're currently searching through.
+  llvm::SmallVector mem;
+  // Position of the memory buffer.
+  addr_t mem_pos = low;
+  // Maximum number of bytes read (and buffered). We need to read at least
+  // `size` bytes for a successful match.
+  const size_t max_read_size = std::max(size, 0x1);
+
+  for (addr_t s = low; s <= (high - size);) {

labath wrote:

How about cur_addr? :) (`s` was a leftover from the original implementation, 
but I've now completely changed it anyway).

https://github.com/llvm/llvm-project/pull/104193
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-28 Thread Pavel Labath via lldb-commits


@@ -3367,21 +3340,48 @@ lldb::addr_t Process::FindInMemory(lldb::addr_t low, 
lldb::addr_t high,
   if (region_size < size)
 return LLDB_INVALID_ADDRESS;
 
+  // See "Boyer-Moore string search algorithm".
   std::vector bad_char_heuristic(256, size);
-  ProcessMemoryIterator iterator(*this, low);
-
   for (size_t idx = 0; idx < size - 1; idx++) {
 decltype(bad_char_heuristic)::size_type bcu_idx = buf[idx];
 bad_char_heuristic[bcu_idx] = size - idx - 1;
   }
-  for (size_t s = 0; s <= (region_size - size);) {
+
+  // Memory we're currently searching through.
+  llvm::SmallVector mem;
+  // Position of the memory buffer.
+  addr_t mem_pos = low;
+  // Maximum number of bytes read (and buffered). We need to read at least
+  // `size` bytes for a successful match.
+  const size_t max_read_size = std::max(size, 0x1);
+
+  for (addr_t s = low; s <= (high - size);) {
+if (s + size > mem.size() + mem_pos) {

labath wrote:

good idea

https://github.com/llvm/llvm-project/pull/104193
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-28 Thread Pavel Labath via lldb-commits


@@ -79,3 +83,25 @@ def test_memory_find(self):
 'memory find -s "nothere" `stringdata` `stringdata+10`',
 substrs=["data not found within the range."],
 )
+
+@expectedFailureWindows
+def test_memory_find_with_holes(self):
+self._prepare_inferior()
+
+self.runCmd("log enable gdb-remote packets")
+self.runCmd("log enable lldb process target")
+
+pagesize = self.frame().FindVariable("pagesize").GetValueAsUnsigned()
+mem_with_holes = (
+self.frame().FindVariable("mem_with_holes").GetValueAsUnsigned()
+)
+matches_var = self.frame().FindVariable("matches")
+self.assertEqual(matches_var.GetNumChildren(), 4)
+matches = [
+f"data found at location: 
{matches_var.GetChildAtIndex(i).GetValueAsUnsigned():#x}"
+for i in range(4)
+]
+self.expect(
+'memory find -c 5 -s "needle" `mem_with_holes` 
`mem_with_holes+5*pagesize`',

labath wrote:

Done (the first part).

https://github.com/llvm/llvm-project/pull/104193
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-28 Thread Pavel Labath via lldb-commits

https://github.com/labath updated 
https://github.com/llvm/llvm-project/pull/104193

>From 7b8b8b699902d2365ea43e5ae015546c4d20fac8 Mon Sep 17 00:00:00 2001
From: Pavel Labath 
Date: Wed, 14 Aug 2024 19:58:27 +0200
Subject: [PATCH 1/4] [lldb] Fix and speedup the `memory find` command

This patch fixes an issue where the `memory find` command would
effectively stop searching after encountering a memory read error (which
could happen due to unreadable memory), without giving any indication
that it has done so (it would just print it could not find the pattern).

To make matters worse, it would not terminate after encountering this
error, but rather proceed to slowly increment the address pointer, which
meant that searching a large region could take a very long time (and
give the appearance that lldb is actually searching for the thing).

The patch fixes this first problem (*) by detecting read errors and
skipping over (using GetMemoryRegionInfo) the unreadable parts of memory
and resuming the search after them. It also reads the memory in bulk (up
to 1MB), which speeds up the search significantly (up to 6x for live
processes, 18x for core files).

(*) The fix does not work on windows yet, because the ReadMemory API
does not return partial results (like it does for other systems). I'm
preparing a separate patch to deal with that.
---
 lldb/source/Target/Process.cpp| 68 +++
 .../memory/find/TestMemoryFind.py | 30 ++-
 .../API/functionalities/memory/find/main.cpp  | 83 +--
 3 files changed, 136 insertions(+), 45 deletions(-)

diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index b2a0f13b9a1549..5d066d264bd3f5 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -114,33 +114,6 @@ class ProcessOptionValueProperties
   }
 };
 
-class ProcessMemoryIterator {
-public:
-  ProcessMemoryIterator(Process &process, lldb::addr_t base)
-  : m_process(process), m_base_addr(base) {}
-
-  bool IsValid() { return m_is_valid; }
-
-  uint8_t operator[](lldb::addr_t offset) {
-if (!IsValid())
-  return 0;
-
-uint8_t retval = 0;
-Status error;
-if (0 == m_process.ReadMemory(m_base_addr + offset, &retval, 1, error)) {
-  m_is_valid = false;
-  return 0;
-}
-
-return retval;
-  }
-
-private:
-  Process &m_process;
-  const lldb::addr_t m_base_addr;
-  bool m_is_valid = true;
-};
-
 static constexpr OptionEnumValueElement g_follow_fork_mode_values[] = {
 {
 eFollowParent,
@@ -3367,21 +3340,48 @@ lldb::addr_t Process::FindInMemory(lldb::addr_t low, 
lldb::addr_t high,
   if (region_size < size)
 return LLDB_INVALID_ADDRESS;
 
+  // See "Boyer-Moore string search algorithm".
   std::vector bad_char_heuristic(256, size);
-  ProcessMemoryIterator iterator(*this, low);
-
   for (size_t idx = 0; idx < size - 1; idx++) {
 decltype(bad_char_heuristic)::size_type bcu_idx = buf[idx];
 bad_char_heuristic[bcu_idx] = size - idx - 1;
   }
-  for (size_t s = 0; s <= (region_size - size);) {
+
+  // Memory we're currently searching through.
+  llvm::SmallVector mem;
+  // Position of the memory buffer.
+  addr_t mem_pos = low;
+  // Maximum number of bytes read (and buffered). We need to read at least
+  // `size` bytes for a successful match.
+  const size_t max_read_size = std::max(size, 0x1);
+
+  for (addr_t s = low; s <= (high - size);) {
+if (s + size > mem.size() + mem_pos) {
+  // We need to read more data. We don't attempt to reuse the data we've
+  // already read (up to `size-1` bytes from `s` to `mem_pos+mem.size()`).
+  // This is fine for patterns much smaller than max_read_size. For very
+  // long patterns we may need to do something more elaborate.
+  mem.resize_for_overwrite(max_read_size);
+  Status error;
+  mem.resize(
+  ReadMemory(s, mem.data(), std::min(mem.size(), high - s), error));
+  mem_pos = s;
+  if (error.Fail() || size > mem.size()) {
+// We didn't read enough data. Skip to the next memory region.
+MemoryRegionInfo info;
+error = GetMemoryRegionInfo(mem_pos + mem.size(), info);
+if (error.Fail())
+  break;
+s = info.GetRange().GetRangeEnd();
+continue;
+  }
+}
 int64_t j = size - 1;
-while (j >= 0 && buf[j] == iterator[s + j])
+while (j >= 0 && buf[j] == mem[s + j - mem_pos])
   j--;
 if (j < 0)
-  return low + s;
-else
-  s += bad_char_heuristic[iterator[s + size - 1]];
+  return s; // We have a match.
+s += bad_char_heuristic[mem[s + size - 1 - mem_pos]];
   }
 
   return LLDB_INVALID_ADDRESS;
diff --git a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py 
b/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
index 09611cc808777d..72acfb3d600701 100644
--- a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
+++ b/lldb/test/API/functionalities/memory/fin

[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-28 Thread Pavel Labath via lldb-commits


@@ -3367,21 +3340,48 @@ lldb::addr_t Process::FindInMemory(lldb::addr_t low, 
lldb::addr_t high,
   if (region_size < size)
 return LLDB_INVALID_ADDRESS;
 
+  // See "Boyer-Moore string search algorithm".
   std::vector bad_char_heuristic(256, size);
-  ProcessMemoryIterator iterator(*this, low);
-
   for (size_t idx = 0; idx < size - 1; idx++) {
 decltype(bad_char_heuristic)::size_type bcu_idx = buf[idx];
 bad_char_heuristic[bcu_idx] = size - idx - 1;
   }
-  for (size_t s = 0; s <= (region_size - size);) {
+
+  // Memory we're currently searching through.
+  llvm::SmallVector mem;

labath wrote:

I'm using the SmallVector for the `resize_for_overwrite` functionality 
(allocating memory without initializing it).

https://github.com/llvm/llvm-project/pull/104193
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-28 Thread Pavel Labath via lldb-commits

labath wrote:

> > It also reads the memory in bulk (up to 1MB)
> 
> ```
> // Maximum number of bytes read (and buffered). We need to read at least
> // `size` bytes for a successful match.
> const size_t max_read_size = std::max(size, 0x1);
> ```
> 
> It seems the minimal chunk is 64KB and the maximal chunk may be very long 
> (more than 1MB).

Depends on how you look at it, I guess. The actual read size will be the 
minimum of this number and the "remainder of memory to be searched" 
(`std::min(mem.size(), high - s)`), so `max_read_size` is really a cap on the 
read size. It's true that the value can be bigger than 64K (or even 1MB). 
However that can only happen if the string being searched is larger than that 
value. This is necessary because the search algorithm matches from the end of 
the pattern.

I wanted to keep the algorithm simple and keep the entire (potential) match in 
memory at once, as trying to read in the middle of matching would get messy. I 
don't think anyone searches for strings this long, but if we do want to 
(better) support that use case, I think that the best solution would be to 
switch to an algorithm (e.g. 
[KMP](https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm))
 which scans the search space in a strictly linear fashion -- as that will make 
it easy to read the memory in arbitrary increments.

> 
> Something is wrong with the first memory block in tests
> 
> ```
> Got output:
> data found at location: 0x7f6bdf3eb000
> 0x7f6bdf3eb000: 6e 65 65 64 6c 65 00 00 00 00 00 00 00 00 00 00  
> needle..
> 0x7f6bdf3eb010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> data found at location: 0x7f6bdf3eb800
> 0x7f6bdf3eb800: 6e 65 65 64 6c 65 00 00 00 00 00 00 00 00 00 00  
> needle..
> 0x7f6bdf3eb810: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 
> data found at location: 0x7f6bdf3edff9
> 0x7f6bdf3edff9: 6e 65 65 64 6c 65 00 50 e0 3e df 6b 7f 00 00 6d  
> needle.P.>.k...m
> 0x7f6bdf3ee009: dd 41 df 6b 7f 00 00 00 00 00 00 00 00 00 00 40  
> .A.k...@
> no more matches within the range.
> 
> Expecting sub string: "data found at location: 0x7f6bdf3e9000" (was not found)
> ```
> 

Interesting. It does work on my machine. I'll try to debug this.

> Note the capacity=0 is a special case here `llvm::SmallVector 
> mem;` I'd recommend
> 
> ```
> const size_t min_read_size = 0x1;
> const size_t max_read_size = std::max(size, min_read_size);
> llvm::SmallVector mem;
> ```

That would allocate the 64k on the stack, which isn't good practice (I doubt it 
will make things faster, and it can cause us to blow the stack). 64k is  1000x 
larger than the [preffered small vector 
size](https://github.com/llvm/llvm-project/blob/16910a21ee0fabab2df291e4e5bc18289bd5762d/llvm/include/llvm/ADT/SmallVector.h#L1150C27-L1150C54).
 In practice there will only ever be one heap allocation here -- first `resize` 
call on the vector will allocate a ~64k heap block, and all subsequent resizes 
will just move the end pointer around.

https://github.com/llvm/llvm-project/pull/104193
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-28 Thread Pavel Labath via lldb-commits

https://github.com/labath edited 
https://github.com/llvm/llvm-project/pull/104193
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-28 Thread Pavel Labath via lldb-commits

https://github.com/labath updated 
https://github.com/llvm/llvm-project/pull/104193

>From 7b8b8b699902d2365ea43e5ae015546c4d20fac8 Mon Sep 17 00:00:00 2001
From: Pavel Labath 
Date: Wed, 14 Aug 2024 19:58:27 +0200
Subject: [PATCH 1/3] [lldb] Fix and speedup the `memory find` command

This patch fixes an issue where the `memory find` command would
effectively stop searching after encountering a memory read error (which
could happen due to unreadable memory), without giving any indication
that it has done so (it would just print it could not find the pattern).

To make matters worse, it would not terminate after encountering this
error, but rather proceed to slowly increment the address pointer, which
meant that searching a large region could take a very long time (and
give the appearance that lldb is actually searching for the thing).

The patch fixes this first problem (*) by detecting read errors and
skipping over (using GetMemoryRegionInfo) the unreadable parts of memory
and resuming the search after them. It also reads the memory in bulk (up
to 1MB), which speeds up the search significantly (up to 6x for live
processes, 18x for core files).

(*) The fix does not work on windows yet, because the ReadMemory API
does not return partial results (like it does for other systems). I'm
preparing a separate patch to deal with that.
---
 lldb/source/Target/Process.cpp| 68 +++
 .../memory/find/TestMemoryFind.py | 30 ++-
 .../API/functionalities/memory/find/main.cpp  | 83 +--
 3 files changed, 136 insertions(+), 45 deletions(-)

diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index b2a0f13b9a1549..5d066d264bd3f5 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -114,33 +114,6 @@ class ProcessOptionValueProperties
   }
 };
 
-class ProcessMemoryIterator {
-public:
-  ProcessMemoryIterator(Process &process, lldb::addr_t base)
-  : m_process(process), m_base_addr(base) {}
-
-  bool IsValid() { return m_is_valid; }
-
-  uint8_t operator[](lldb::addr_t offset) {
-if (!IsValid())
-  return 0;
-
-uint8_t retval = 0;
-Status error;
-if (0 == m_process.ReadMemory(m_base_addr + offset, &retval, 1, error)) {
-  m_is_valid = false;
-  return 0;
-}
-
-return retval;
-  }
-
-private:
-  Process &m_process;
-  const lldb::addr_t m_base_addr;
-  bool m_is_valid = true;
-};
-
 static constexpr OptionEnumValueElement g_follow_fork_mode_values[] = {
 {
 eFollowParent,
@@ -3367,21 +3340,48 @@ lldb::addr_t Process::FindInMemory(lldb::addr_t low, 
lldb::addr_t high,
   if (region_size < size)
 return LLDB_INVALID_ADDRESS;
 
+  // See "Boyer-Moore string search algorithm".
   std::vector bad_char_heuristic(256, size);
-  ProcessMemoryIterator iterator(*this, low);
-
   for (size_t idx = 0; idx < size - 1; idx++) {
 decltype(bad_char_heuristic)::size_type bcu_idx = buf[idx];
 bad_char_heuristic[bcu_idx] = size - idx - 1;
   }
-  for (size_t s = 0; s <= (region_size - size);) {
+
+  // Memory we're currently searching through.
+  llvm::SmallVector mem;
+  // Position of the memory buffer.
+  addr_t mem_pos = low;
+  // Maximum number of bytes read (and buffered). We need to read at least
+  // `size` bytes for a successful match.
+  const size_t max_read_size = std::max(size, 0x1);
+
+  for (addr_t s = low; s <= (high - size);) {
+if (s + size > mem.size() + mem_pos) {
+  // We need to read more data. We don't attempt to reuse the data we've
+  // already read (up to `size-1` bytes from `s` to `mem_pos+mem.size()`).
+  // This is fine for patterns much smaller than max_read_size. For very
+  // long patterns we may need to do something more elaborate.
+  mem.resize_for_overwrite(max_read_size);
+  Status error;
+  mem.resize(
+  ReadMemory(s, mem.data(), std::min(mem.size(), high - s), error));
+  mem_pos = s;
+  if (error.Fail() || size > mem.size()) {
+// We didn't read enough data. Skip to the next memory region.
+MemoryRegionInfo info;
+error = GetMemoryRegionInfo(mem_pos + mem.size(), info);
+if (error.Fail())
+  break;
+s = info.GetRange().GetRangeEnd();
+continue;
+  }
+}
 int64_t j = size - 1;
-while (j >= 0 && buf[j] == iterator[s + j])
+while (j >= 0 && buf[j] == mem[s + j - mem_pos])
   j--;
 if (j < 0)
-  return low + s;
-else
-  s += bad_char_heuristic[iterator[s + size - 1]];
+  return s; // We have a match.
+s += bad_char_heuristic[mem[s + size - 1 - mem_pos]];
   }
 
   return LLDB_INVALID_ADDRESS;
diff --git a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py 
b/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
index 09611cc808777d..72acfb3d600701 100644
--- a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
+++ b/lldb/test/API/functionalities/memory/fin

[Lldb-commits] [lldb] [lldb] Turn lldb_private::Status into a value type. (PR #106163)

2024-08-28 Thread Pavel Labath via lldb-commits

labath wrote:

> > * since this seems like a perfect opportunity to bikesh^Wdiscuss the names, 
> > I'm going to ask if there's any appetite for shortening some of the new 
> > factory functions. `Status::FromErrorStringWithFormatv` is a bit of a 
> > mouthful, so I was thinking if we could  use something shorter instead 
> > (`Status::FromFormatv` or even `Status::Formatv`) ?
> 
> I picked these names, because they are in line with the old names, which made 
> the regex replacement feasible. Renaming them afterwards is going to be 
> easier. My 2 cents on the naming: I had `Status::FromFormatv` in a previous 
> iteration of this patch and changed my mind, because it doesn't indicate that 
> this is going to be an error. What do you think about the slightly shorter 
> `Status::ErrorFromFromatv()`?

I think it's better (because its shorter), though it's still a bit long for my 
taste. FWIW, I don't find the short name at all confusing, because my mental 
model of these error types is (and all types I know behave that way) is that 
they have only one "success" value, and everything else represents some kind of 
an error/failure. So, when I see anything more complicated than `Status()` (or 
whatever the final API is for constructing success values), I immediately 
expect an error status.

I guess one difference is that I'm used to working with `absl::Status` (which I 
believe was zturner's inspiration for this name), so "status" feels just like a 
synonym for "error" to me. If you think having the word "error" in the name 
helps, I'd also be fine with `Status::Errorf` and `Status::Errorv` (for printf 
vs formatv styles) or something like that.

https://github.com/llvm/llvm-project/pull/106163
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-27 Thread Pavel Labath via lldb-commits

https://github.com/labath updated 
https://github.com/llvm/llvm-project/pull/104193

>From 7b8b8b699902d2365ea43e5ae015546c4d20fac8 Mon Sep 17 00:00:00 2001
From: Pavel Labath 
Date: Wed, 14 Aug 2024 19:58:27 +0200
Subject: [PATCH 1/2] [lldb] Fix and speedup the `memory find` command

This patch fixes an issue where the `memory find` command would
effectively stop searching after encountering a memory read error (which
could happen due to unreadable memory), without giving any indication
that it has done so (it would just print it could not find the pattern).

To make matters worse, it would not terminate after encountering this
error, but rather proceed to slowly increment the address pointer, which
meant that searching a large region could take a very long time (and
give the appearance that lldb is actually searching for the thing).

The patch fixes this first problem (*) by detecting read errors and
skipping over (using GetMemoryRegionInfo) the unreadable parts of memory
and resuming the search after them. It also reads the memory in bulk (up
to 1MB), which speeds up the search significantly (up to 6x for live
processes, 18x for core files).

(*) The fix does not work on windows yet, because the ReadMemory API
does not return partial results (like it does for other systems). I'm
preparing a separate patch to deal with that.
---
 lldb/source/Target/Process.cpp| 68 +++
 .../memory/find/TestMemoryFind.py | 30 ++-
 .../API/functionalities/memory/find/main.cpp  | 83 +--
 3 files changed, 136 insertions(+), 45 deletions(-)

diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index b2a0f13b9a1549..5d066d264bd3f5 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -114,33 +114,6 @@ class ProcessOptionValueProperties
   }
 };
 
-class ProcessMemoryIterator {
-public:
-  ProcessMemoryIterator(Process &process, lldb::addr_t base)
-  : m_process(process), m_base_addr(base) {}
-
-  bool IsValid() { return m_is_valid; }
-
-  uint8_t operator[](lldb::addr_t offset) {
-if (!IsValid())
-  return 0;
-
-uint8_t retval = 0;
-Status error;
-if (0 == m_process.ReadMemory(m_base_addr + offset, &retval, 1, error)) {
-  m_is_valid = false;
-  return 0;
-}
-
-return retval;
-  }
-
-private:
-  Process &m_process;
-  const lldb::addr_t m_base_addr;
-  bool m_is_valid = true;
-};
-
 static constexpr OptionEnumValueElement g_follow_fork_mode_values[] = {
 {
 eFollowParent,
@@ -3367,21 +3340,48 @@ lldb::addr_t Process::FindInMemory(lldb::addr_t low, 
lldb::addr_t high,
   if (region_size < size)
 return LLDB_INVALID_ADDRESS;
 
+  // See "Boyer-Moore string search algorithm".
   std::vector bad_char_heuristic(256, size);
-  ProcessMemoryIterator iterator(*this, low);
-
   for (size_t idx = 0; idx < size - 1; idx++) {
 decltype(bad_char_heuristic)::size_type bcu_idx = buf[idx];
 bad_char_heuristic[bcu_idx] = size - idx - 1;
   }
-  for (size_t s = 0; s <= (region_size - size);) {
+
+  // Memory we're currently searching through.
+  llvm::SmallVector mem;
+  // Position of the memory buffer.
+  addr_t mem_pos = low;
+  // Maximum number of bytes read (and buffered). We need to read at least
+  // `size` bytes for a successful match.
+  const size_t max_read_size = std::max(size, 0x1);
+
+  for (addr_t s = low; s <= (high - size);) {
+if (s + size > mem.size() + mem_pos) {
+  // We need to read more data. We don't attempt to reuse the data we've
+  // already read (up to `size-1` bytes from `s` to `mem_pos+mem.size()`).
+  // This is fine for patterns much smaller than max_read_size. For very
+  // long patterns we may need to do something more elaborate.
+  mem.resize_for_overwrite(max_read_size);
+  Status error;
+  mem.resize(
+  ReadMemory(s, mem.data(), std::min(mem.size(), high - s), error));
+  mem_pos = s;
+  if (error.Fail() || size > mem.size()) {
+// We didn't read enough data. Skip to the next memory region.
+MemoryRegionInfo info;
+error = GetMemoryRegionInfo(mem_pos + mem.size(), info);
+if (error.Fail())
+  break;
+s = info.GetRange().GetRangeEnd();
+continue;
+  }
+}
 int64_t j = size - 1;
-while (j >= 0 && buf[j] == iterator[s + j])
+while (j >= 0 && buf[j] == mem[s + j - mem_pos])
   j--;
 if (j < 0)
-  return low + s;
-else
-  s += bad_char_heuristic[iterator[s + size - 1]];
+  return s; // We have a match.
+s += bad_char_heuristic[mem[s + size - 1 - mem_pos]];
   }
 
   return LLDB_INVALID_ADDRESS;
diff --git a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py 
b/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
index 09611cc808777d..72acfb3d600701 100644
--- a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
+++ b/lldb/test/API/functionalities/memory/fin

[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-27 Thread Pavel Labath via lldb-commits

https://github.com/labath ready_for_review 
https://github.com/llvm/llvm-project/pull/104193
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-27 Thread Pavel Labath via lldb-commits

https://github.com/labath updated 
https://github.com/llvm/llvm-project/pull/104193

>From 7b8b8b699902d2365ea43e5ae015546c4d20fac8 Mon Sep 17 00:00:00 2001
From: Pavel Labath 
Date: Wed, 14 Aug 2024 19:58:27 +0200
Subject: [PATCH] [lldb] Fix and speedup the `memory find` command

This patch fixes an issue where the `memory find` command would
effectively stop searching after encountering a memory read error (which
could happen due to unreadable memory), without giving any indication
that it has done so (it would just print it could not find the pattern).

To make matters worse, it would not terminate after encountering this
error, but rather proceed to slowly increment the address pointer, which
meant that searching a large region could take a very long time (and
give the appearance that lldb is actually searching for the thing).

The patch fixes this first problem (*) by detecting read errors and
skipping over (using GetMemoryRegionInfo) the unreadable parts of memory
and resuming the search after them. It also reads the memory in bulk (up
to 1MB), which speeds up the search significantly (up to 6x for live
processes, 18x for core files).

(*) The fix does not work on windows yet, because the ReadMemory API
does not return partial results (like it does for other systems). I'm
preparing a separate patch to deal with that.
---
 lldb/source/Target/Process.cpp| 68 +++
 .../memory/find/TestMemoryFind.py | 30 ++-
 .../API/functionalities/memory/find/main.cpp  | 83 +--
 3 files changed, 136 insertions(+), 45 deletions(-)

diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index b2a0f13b9a1549..5d066d264bd3f5 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -114,33 +114,6 @@ class ProcessOptionValueProperties
   }
 };
 
-class ProcessMemoryIterator {
-public:
-  ProcessMemoryIterator(Process &process, lldb::addr_t base)
-  : m_process(process), m_base_addr(base) {}
-
-  bool IsValid() { return m_is_valid; }
-
-  uint8_t operator[](lldb::addr_t offset) {
-if (!IsValid())
-  return 0;
-
-uint8_t retval = 0;
-Status error;
-if (0 == m_process.ReadMemory(m_base_addr + offset, &retval, 1, error)) {
-  m_is_valid = false;
-  return 0;
-}
-
-return retval;
-  }
-
-private:
-  Process &m_process;
-  const lldb::addr_t m_base_addr;
-  bool m_is_valid = true;
-};
-
 static constexpr OptionEnumValueElement g_follow_fork_mode_values[] = {
 {
 eFollowParent,
@@ -3367,21 +3340,48 @@ lldb::addr_t Process::FindInMemory(lldb::addr_t low, 
lldb::addr_t high,
   if (region_size < size)
 return LLDB_INVALID_ADDRESS;
 
+  // See "Boyer-Moore string search algorithm".
   std::vector bad_char_heuristic(256, size);
-  ProcessMemoryIterator iterator(*this, low);
-
   for (size_t idx = 0; idx < size - 1; idx++) {
 decltype(bad_char_heuristic)::size_type bcu_idx = buf[idx];
 bad_char_heuristic[bcu_idx] = size - idx - 1;
   }
-  for (size_t s = 0; s <= (region_size - size);) {
+
+  // Memory we're currently searching through.
+  llvm::SmallVector mem;
+  // Position of the memory buffer.
+  addr_t mem_pos = low;
+  // Maximum number of bytes read (and buffered). We need to read at least
+  // `size` bytes for a successful match.
+  const size_t max_read_size = std::max(size, 0x1);
+
+  for (addr_t s = low; s <= (high - size);) {
+if (s + size > mem.size() + mem_pos) {
+  // We need to read more data. We don't attempt to reuse the data we've
+  // already read (up to `size-1` bytes from `s` to `mem_pos+mem.size()`).
+  // This is fine for patterns much smaller than max_read_size. For very
+  // long patterns we may need to do something more elaborate.
+  mem.resize_for_overwrite(max_read_size);
+  Status error;
+  mem.resize(
+  ReadMemory(s, mem.data(), std::min(mem.size(), high - s), error));
+  mem_pos = s;
+  if (error.Fail() || size > mem.size()) {
+// We didn't read enough data. Skip to the next memory region.
+MemoryRegionInfo info;
+error = GetMemoryRegionInfo(mem_pos + mem.size(), info);
+if (error.Fail())
+  break;
+s = info.GetRange().GetRangeEnd();
+continue;
+  }
+}
 int64_t j = size - 1;
-while (j >= 0 && buf[j] == iterator[s + j])
+while (j >= 0 && buf[j] == mem[s + j - mem_pos])
   j--;
 if (j < 0)
-  return low + s;
-else
-  s += bad_char_heuristic[iterator[s + size - 1]];
+  return s; // We have a match.
+s += bad_char_heuristic[mem[s + size - 1 - mem_pos]];
   }
 
   return LLDB_INVALID_ADDRESS;
diff --git a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py 
b/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
index 09611cc808777d..72acfb3d600701 100644
--- a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
+++ b/lldb/test/API/functionalities/memory/find/Te

[Lldb-commits] [lldb] [lldb] Fix and speedup the `memory find` command (PR #104193)

2024-08-27 Thread Pavel Labath via lldb-commits

https://github.com/labath edited 
https://github.com/llvm/llvm-project/pull/104193
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][WIP] memory find speedup+bugfix (PR #104193)

2024-08-27 Thread Pavel Labath via lldb-commits

https://github.com/labath edited 
https://github.com/llvm/llvm-project/pull/104193
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][WIP] memory find speedup+bugfix (PR #104193)

2024-08-27 Thread Pavel Labath via lldb-commits

https://github.com/labath updated 
https://github.com/llvm/llvm-project/pull/104193

>From b46820a91f1ad8d6db46210e3135abdb1ab475e8 Mon Sep 17 00:00:00 2001
From: Pavel Labath 
Date: Wed, 14 Aug 2024 19:58:27 +0200
Subject: [PATCH] [lldb] Fix and speedup the `memory find` command

This patch fixes an issue where the `memory find` command would
effectively stop searching after encountering a memory read error (which
could happen due to unreadable memory), without giving any indication
that it has done so (it would just print it could not find the pattern).

To make matters worse, it would not terminate after encountering this
error, but rather proceed to slowly increment the address pointer, which
meant that searching a large region could take a very long time (and
give the appearance that lldb is actually searching for the thing).

The patch fixes this first problem (*) by detecting read errors and
skipping over (using GetMemoryRegionInfo) the unreadable parts of memory
and resuming the search after them. It also reads the memory in bulk (up
to 1MB), which speeds up the search significantly (up to 6x for live
processes, 18x for core files).

(*) The fix does not work on windows yet, because the ReadMemory API
does not return partial results (like it does for other systems). I'm
preparing a separate patch to deal with that.
---
 lldb/source/Target/Process.cpp| 68 +++
 .../memory/find/TestMemoryFind.py | 25 +-
 .../API/functionalities/memory/find/main.cpp  | 83 +--
 3 files changed, 131 insertions(+), 45 deletions(-)

diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index b2a0f13b9a1549..5d066d264bd3f5 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -114,33 +114,6 @@ class ProcessOptionValueProperties
   }
 };
 
-class ProcessMemoryIterator {
-public:
-  ProcessMemoryIterator(Process &process, lldb::addr_t base)
-  : m_process(process), m_base_addr(base) {}
-
-  bool IsValid() { return m_is_valid; }
-
-  uint8_t operator[](lldb::addr_t offset) {
-if (!IsValid())
-  return 0;
-
-uint8_t retval = 0;
-Status error;
-if (0 == m_process.ReadMemory(m_base_addr + offset, &retval, 1, error)) {
-  m_is_valid = false;
-  return 0;
-}
-
-return retval;
-  }
-
-private:
-  Process &m_process;
-  const lldb::addr_t m_base_addr;
-  bool m_is_valid = true;
-};
-
 static constexpr OptionEnumValueElement g_follow_fork_mode_values[] = {
 {
 eFollowParent,
@@ -3367,21 +3340,48 @@ lldb::addr_t Process::FindInMemory(lldb::addr_t low, 
lldb::addr_t high,
   if (region_size < size)
 return LLDB_INVALID_ADDRESS;
 
+  // See "Boyer-Moore string search algorithm".
   std::vector bad_char_heuristic(256, size);
-  ProcessMemoryIterator iterator(*this, low);
-
   for (size_t idx = 0; idx < size - 1; idx++) {
 decltype(bad_char_heuristic)::size_type bcu_idx = buf[idx];
 bad_char_heuristic[bcu_idx] = size - idx - 1;
   }
-  for (size_t s = 0; s <= (region_size - size);) {
+
+  // Memory we're currently searching through.
+  llvm::SmallVector mem;
+  // Position of the memory buffer.
+  addr_t mem_pos = low;
+  // Maximum number of bytes read (and buffered). We need to read at least
+  // `size` bytes for a successful match.
+  const size_t max_read_size = std::max(size, 0x1);
+
+  for (addr_t s = low; s <= (high - size);) {
+if (s + size > mem.size() + mem_pos) {
+  // We need to read more data. We don't attempt to reuse the data we've
+  // already read (up to `size-1` bytes from `s` to `mem_pos+mem.size()`).
+  // This is fine for patterns much smaller than max_read_size. For very
+  // long patterns we may need to do something more elaborate.
+  mem.resize_for_overwrite(max_read_size);
+  Status error;
+  mem.resize(
+  ReadMemory(s, mem.data(), std::min(mem.size(), high - s), error));
+  mem_pos = s;
+  if (error.Fail() || size > mem.size()) {
+// We didn't read enough data. Skip to the next memory region.
+MemoryRegionInfo info;
+error = GetMemoryRegionInfo(mem_pos + mem.size(), info);
+if (error.Fail())
+  break;
+s = info.GetRange().GetRangeEnd();
+continue;
+  }
+}
 int64_t j = size - 1;
-while (j >= 0 && buf[j] == iterator[s + j])
+while (j >= 0 && buf[j] == mem[s + j - mem_pos])
   j--;
 if (j < 0)
-  return low + s;
-else
-  s += bad_char_heuristic[iterator[s + size - 1]];
+  return s; // We have a match.
+s += bad_char_heuristic[mem[s + size - 1 - mem_pos]];
   }
 
   return LLDB_INVALID_ADDRESS;
diff --git a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py 
b/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
index 09611cc808777d..88fbd0c192a58e 100644
--- a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
+++ b/lldb/test/API/functionalities/memory/find/Tes

[Lldb-commits] [lldb] [lldb] Updated TCPSocket to listen multiple ports on the same single thread (PR #104797)

2024-08-27 Thread Pavel Labath via lldb-commits

labath wrote:

Having a single socket listen on multiple ports sounds like a bad idea to me. 
The socket class is complicated enough as it is, and it's not the way the 
MainLoop class was meant to be used (the idea being for it to be created at the 
topmost level possible, so that any number of users could register their 
handlers).

I think that a better and more versatile API would be something like:
```
std::vector /*or sth like that*/ 
TCPSocket::Accept(MainLoop &mainloop, 
std::function accepted_socket)> /*or similar*/ 
socket_cb);
```

Then we could create two sockets and have them listen on the same main loop 
instance. It should be possible to reimplement the existing TCPSocket::Accept 
function on top of this API.

https://github.com/llvm/llvm-project/pull/104797
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Turn lldb_private::Status into a value type. (PR #106163)

2024-08-27 Thread Pavel Labath via lldb-commits

labath wrote:

I like this. I have just two remarks:
- it might be better to split this into three steps (add new APIs, port to new 
APIs, remove old APIs), as that will make reverts easier/less disruptive (I 
don't know how much we can trust pre-commit CI these days, but I wouldn't be 
surprised if this breaks some platform-specific code).
- since this seems like a perfect opportunity to bikesh^Wdiscuss the names, I'm 
going to ask if there's any appetite for shortening some of the new factory 
functions. `Status::FromErrorStringWithFormatv` is a bit of a mouthful, so I 
was thinking if we could  use something shorter instead (`Status::FromFormatv` 
or even `Status::Formatv`) ?

https://github.com/llvm/llvm-project/pull/106163
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Add 'FindInMemory()' overload for PostMortemProcess. (PR #102536)

2024-08-27 Thread Pavel Labath via lldb-commits

labath wrote:

> Let me know if you guys don't want this patch in. I will closed it and apply 
> it to our local branch. @jasonmolenda @labath

Without claiming official authority to make this decision, I'm going to say 
that *I* don't think this is a good idea, because the patch is very (and IMO, 
unnecessarily) specific to post mortem processes. My (currently WIP) patch 
(#104193) gets you most of the speed benefits, works with live processes as 
well and it does that without introducing any new APIs.

What you do downstream is up to you, but I would encourage you to wait for me 
to finish up that patch, as this doesn't look like something that would be a 
good idea to carry downstream (I certainly wouldn't want to do it).

https://github.com/llvm/llvm-project/pull/102536
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] unique_ptr-ify some GetUserExpression APIs. (PR #106034)

2024-08-27 Thread Pavel Labath via lldb-commits

https://github.com/labath approved this pull request.


https://github.com/llvm/llvm-project/pull/106034
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-26 Thread Pavel Labath via lldb-commits


@@ -96,16 +98,120 @@ def getArchSpec(self, architecture):
 """
 return ["ARCH=" + architecture] if architecture else []
 
-def getCCSpec(self, compiler):
+def getToolchainSpec(self, compiler):
 """
-Helper function to return the key-value string to specify the compiler
+Helper function to return the key-value strings to specify the 
toolchain
 used for the make system.
 """
 cc = compiler if compiler else None
 if not cc and configuration.compiler:
 cc = configuration.compiler
+
 if cc:
-return ['CC="%s"' % cc]
+exe_ext = ""
+if lldbplatformutil.getHostPlatform() == "windows":
+exe_ext = ".exe"
+
+cc = cc.strip()
+cc_path = pathlib.Path(cc)
+cc = cc_path.as_posix()
+
+# We can get CC compiler string in the following formats:
+#  [] - such as 'xrun clang', 'xrun 
/usr/bin/clang' & etc
+#
+# Where  could contain the following parts:
+#   [.]   - sucn as 
'clang', 'clang.exe' ('clamg-cl.exe'?)
+#   -[.]   - such as 
'armv7-linux-gnueabi-gcc'
+#   /[.]- such as 
'/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+#   /-[.]- such as 
'/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+
+cc_ext = cc_path.suffix
+# Compiler name without extension
+cc_name = cc_path.stem.split(" ")[-1]
+
+# A kind of compiler (canonical name): clang, gcc, cc & etc.
+cc_type = cc_name
+# A triple prefix of compiler name: gcc
+cc_prefix = ""
+if not "clang-cl" in cc_name and not "llvm-gcc" in cc_name:
+cc_name_parts = cc_name.split("-")
+cc_type = cc_name_parts[-1]
+if len(cc_name_parts) > 1:
+cc_prefix = "-".join(cc_name_parts[:-1]) + "-"
+
+# A kind of C++ compiler.
+cxx_types = {
+"icc": "icpc",
+"llvm-gcc": "llvm-g++",
+"gcc": "g++",
+"cc": "c++",
+"clang": "clang++",
+}
+cxx_type = cxx_types.get(cc_type, cc_type)
+
+cc_dir = cc_path.parent
+
+def getLlvmUtil(util_name):
+llvm_tools_dir = os.getenv("LLVM_TOOLS_DIR", cc_dir.as_posix())
+return llvm_tools_dir + "/" + util_name + exe_ext

labath wrote:

```suggestion
return os.path.join(llvm_tools_dir, util_name + exe_ext)
```

https://github.com/llvm/llvm-project/pull/102185
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-26 Thread Pavel Labath via lldb-commits


@@ -102,15 +102,33 @@ endif
 # If you change the defaults of CC, be sure to also change it in the file
 # test/builders/builder_base.py, which provides a Python way to return the
 # value of the make variable CC -- getCompiler().
-#
-# See also these functions:
-#   o cxx_compiler
-#   o cxx_linker
 #--
 ifeq "$(CC)" ""
 $(error "C compiler is not specified. Please run tests through lldb-dotest or 
lit")
 endif
 
+# Remove all " and ' characters from the path. Also remove surrounding space 
chars.
+cstrip = $(strip $(subst ",,$(subst ',,$(1
+override CC := $(call cstrip,$(CC))
+override CCC := $(call cstrip,$(CCC))
+override CXX := $(call cstrip,$(CXX))
+# Always override the linker. Assign already normalized CC.
+override LD := $(call cstrip,$(CC))
+# A kind of linker. It always gets retrieved from CC.
+override LDC := $(call cstrip,$(CCC))
+override OBJCOPY := $(call cstrip,$(OBJCOPY))
+override STRIP := $(call cstrip,$(STRIP))
+override ARCHIVER := $(call cstrip,$(ARCHIVER))
+override AR := $(call cstrip,$(AR))
+override DWP := $(call cstrip,$(DWP))
+override LLVM_AR := $(call cstrip,$(LLVM_AR))

labath wrote:

Is this really necessary? Could we get rid of it by *not* passing the extra 
quotes in the python cmd?

https://github.com/llvm/llvm-project/pull/102185
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-26 Thread Pavel Labath via lldb-commits


@@ -96,16 +98,120 @@ def getArchSpec(self, architecture):
 """
 return ["ARCH=" + architecture] if architecture else []
 
-def getCCSpec(self, compiler):
+def getToolchainSpec(self, compiler):
 """
-Helper function to return the key-value string to specify the compiler
+Helper function to return the key-value strings to specify the 
toolchain
 used for the make system.
 """
 cc = compiler if compiler else None
 if not cc and configuration.compiler:
 cc = configuration.compiler
+
 if cc:
-return ['CC="%s"' % cc]
+exe_ext = ""
+if lldbplatformutil.getHostPlatform() == "windows":
+exe_ext = ".exe"
+
+cc = cc.strip()
+cc_path = pathlib.Path(cc)
+cc = cc_path.as_posix()
+
+# We can get CC compiler string in the following formats:
+#  [] - such as 'xrun clang', 'xrun 
/usr/bin/clang' & etc
+#
+# Where  could contain the following parts:
+#   [.]   - sucn as 
'clang', 'clang.exe' ('clamg-cl.exe'?)
+#   -[.]   - such as 
'armv7-linux-gnueabi-gcc'
+#   /[.]- such as 
'/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+#   /-[.]- such as 
'/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+
+cc_ext = cc_path.suffix
+# Compiler name without extension
+cc_name = cc_path.stem.split(" ")[-1]
+
+# A kind of compiler (canonical name): clang, gcc, cc & etc.
+cc_type = cc_name
+# A triple prefix of compiler name: gcc
+cc_prefix = ""
+if not "clang-cl" in cc_name and not "llvm-gcc" in cc_name:
+cc_name_parts = cc_name.split("-")
+cc_type = cc_name_parts[-1]
+if len(cc_name_parts) > 1:
+cc_prefix = "-".join(cc_name_parts[:-1]) + "-"
+
+# A kind of C++ compiler.
+cxx_types = {
+"icc": "icpc",
+"llvm-gcc": "llvm-g++",
+"gcc": "g++",
+"cc": "c++",
+"clang": "clang++",
+}
+cxx_type = cxx_types.get(cc_type, cc_type)
+
+cc_dir = cc_path.parent
+
+def getLlvmUtil(util_name):
+llvm_tools_dir = os.getenv("LLVM_TOOLS_DIR", cc_dir.as_posix())
+return llvm_tools_dir + "/" + util_name + exe_ext
+
+def getToolchainUtil(util_name):
+return (cc_dir / (cc_prefix + util_name + cc_ext)).as_posix()
+
+cxx = getToolchainUtil(cxx_type)
+
+util_names = {
+"OBJCOPY": "objcopy",
+"STRIP": "objcopy",
+"ARCHIVER": "ar",
+"DWP": "dwp",
+}
+utils = []
+
+# Note: LLVM_AR is currently required by API TestBSDArchives.py 
tests.
+# Assembly a full path to llvm-ar for given LLVM_TOOLS_DIR;
+# otherwise assume we have llvm-ar at the same place where is CC 
specified.
+if not os.getenv("LLVM_AR"):
+llvm_ar = getLlvmUtil("llvm-ar")
+utils.extend(['LLVM_AR="%s"' % llvm_ar])
+
+if not lldbplatformutil.platformIsDarwin():
+if os.getenv("USE_LLVM_TOOLS"):
+# Use the llvm project tool instead of the system defaults.
+for var, name in util_names.items():
+utils.append('%s="%s"' % (var, getLlvmUtil("llvm-" + 
name)))
+utils.extend(['AR="%s"' % llvm_ar])
+elif cc_type in ["clang", "cc", "gcc"]:
+util_paths = {}
+# Assembly a toolchain side tool cmd based on passed CC.
+for var, name in util_names.items():
+# Do not override explicity specified tool from the 
cmd line.
+if not os.getenv(var):
+util_paths[var] = getToolchainUtil(name)
+else:
+util_paths[var] = os.getenv(var)
+utils.extend(['AR="%s"' % util_paths["ARCHIVER"]])
+
+# Look for llvm-dwp or gnu dwp
+if not lldbutil.which(util_paths["DWP"]):
+util_paths["DWP"] = getToolchainUtil("llvm-dwp")
+if not lldbutil.which(util_paths["DWP"]):
+util_paths["DWP"] = lldbutil.which("llvm-dwp")
+if not util_paths["DWP"]:
+util_paths["DWP"] = lldbutil.which("dwp")
+if not util_paths["DWP"]:
+del util_paths["DWP"]
+
+for var, path in util_paths.items():
+   

[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-26 Thread Pavel Labath via lldb-commits


@@ -213,7 +229,7 @@ endif
 LIMIT_DEBUG_INFO_FLAGS =
 NO_LIMIT_DEBUG_INFO_FLAGS =
 MODULE_DEBUG_INFO_FLAGS =
-ifneq (,$(findstring clang,$(CC)))
+ifeq ($(CCC), clang)

labath wrote:

What does CCC stand for?

https://github.com/llvm/llvm-project/pull/102185
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-26 Thread Pavel Labath via lldb-commits


@@ -96,16 +98,120 @@ def getArchSpec(self, architecture):
 """
 return ["ARCH=" + architecture] if architecture else []
 
-def getCCSpec(self, compiler):
+def getToolchainSpec(self, compiler):
 """
-Helper function to return the key-value string to specify the compiler
+Helper function to return the key-value strings to specify the 
toolchain
 used for the make system.
 """
 cc = compiler if compiler else None
 if not cc and configuration.compiler:
 cc = configuration.compiler
+
 if cc:
-return ['CC="%s"' % cc]
+exe_ext = ""
+if lldbplatformutil.getHostPlatform() == "windows":
+exe_ext = ".exe"
+
+cc = cc.strip()
+cc_path = pathlib.Path(cc)
+cc = cc_path.as_posix()
+
+# We can get CC compiler string in the following formats:
+#  [] - such as 'xrun clang', 'xrun 
/usr/bin/clang' & etc
+#
+# Where  could contain the following parts:
+#   [.]   - sucn as 
'clang', 'clang.exe' ('clamg-cl.exe'?)
+#   -[.]   - such as 
'armv7-linux-gnueabi-gcc'
+#   /[.]- such as 
'/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+#   /-[.]- such as 
'/usr/bin/clang', 'c:\path\to\compiler\clang,exe'
+
+cc_ext = cc_path.suffix
+# Compiler name without extension
+cc_name = cc_path.stem.split(" ")[-1]
+
+# A kind of compiler (canonical name): clang, gcc, cc & etc.
+cc_type = cc_name
+# A triple prefix of compiler name: gcc
+cc_prefix = ""
+if not "clang-cl" in cc_name and not "llvm-gcc" in cc_name:
+cc_name_parts = cc_name.split("-")
+cc_type = cc_name_parts[-1]
+if len(cc_name_parts) > 1:
+cc_prefix = "-".join(cc_name_parts[:-1]) + "-"
+
+# A kind of C++ compiler.
+cxx_types = {
+"icc": "icpc",
+"llvm-gcc": "llvm-g++",
+"gcc": "g++",
+"cc": "c++",
+"clang": "clang++",
+}
+cxx_type = cxx_types.get(cc_type, cc_type)
+
+cc_dir = cc_path.parent
+
+def getLlvmUtil(util_name):
+llvm_tools_dir = os.getenv("LLVM_TOOLS_DIR", cc_dir.as_posix())
+return llvm_tools_dir + "/" + util_name + exe_ext
+
+def getToolchainUtil(util_name):
+return (cc_dir / (cc_prefix + util_name + cc_ext)).as_posix()
+
+cxx = getToolchainUtil(cxx_type)
+
+util_names = {
+"OBJCOPY": "objcopy",
+"STRIP": "objcopy",
+"ARCHIVER": "ar",
+"DWP": "dwp",
+}
+utils = []
+
+# Note: LLVM_AR is currently required by API TestBSDArchives.py 
tests.
+# Assembly a full path to llvm-ar for given LLVM_TOOLS_DIR;
+# otherwise assume we have llvm-ar at the same place where is CC 
specified.
+if not os.getenv("LLVM_AR"):
+llvm_ar = getLlvmUtil("llvm-ar")
+utils.extend(['LLVM_AR="%s"' % llvm_ar])
+
+if not lldbplatformutil.platformIsDarwin():
+if os.getenv("USE_LLVM_TOOLS"):

labath wrote:

I don't see any reference to this var in the llvm tree. Who sets this? (and why)

https://github.com/llvm/llvm-project/pull/102185
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-26 Thread Pavel Labath via lldb-commits


@@ -96,16 +98,120 @@ def getArchSpec(self, architecture):
 """
 return ["ARCH=" + architecture] if architecture else []
 
-def getCCSpec(self, compiler):
+def getToolchainSpec(self, compiler):
 """
-Helper function to return the key-value string to specify the compiler
+Helper function to return the key-value strings to specify the 
toolchain
 used for the make system.
 """
 cc = compiler if compiler else None
 if not cc and configuration.compiler:
 cc = configuration.compiler
+
 if cc:
-return ['CC="%s"' % cc]
+exe_ext = ""
+if lldbplatformutil.getHostPlatform() == "windows":
+exe_ext = ".exe"
+
+cc = cc.strip()
+cc_path = pathlib.Path(cc)
+cc = cc_path.as_posix()

labath wrote:

What's up with the posix thing? Could we stick to native paths?

https://github.com/llvm/llvm-project/pull/102185
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][test] Improve toolchain detection in Makefile.rules (PR #102185)

2024-08-26 Thread Pavel Labath via lldb-commits


@@ -96,16 +98,107 @@ def getArchSpec(self, architecture):
 """
 return ["ARCH=" + architecture] if architecture else []
 
-def getCCSpec(self, compiler):
+def getToolchainSpec(self, compiler):
 """
-Helper function to return the key-value string to specify the compiler
+Helper function to return the key-value strings to specify the 
toolchain
 used for the make system.
 """
 cc = compiler if compiler else None
 if not cc and configuration.compiler:
 cc = configuration.compiler
+
 if cc:
-return ['CC="%s"' % cc]
+exe_ext = ""
+if lldbplatformutil.getHostPlatform() == "windows":
+exe_ext = ".exe"
+
+cc = cc.strip()
+cc_path = pathlib.Path(cc)
+cc = cc_path.as_posix()
+
+# We can get CC compiler string in the following formats:
+#  [] - such as 'xrun clang', 'xrun 
/usr/bin/clang' & etc
+#
+# Where  could contain the following parts:
+#   [.]   - sucn as 
'clang', 'clang.exe' ('clamg-cl.exe'?)

labath wrote:

```suggestion
#   [.]   - sucn as 
'clang', 'clang.exe' ('clang-cl.exe'?)
```

https://github.com/llvm/llvm-project/pull/102185
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Fix windows debug build after 9d07f43 (PR #104896)

2024-08-26 Thread Pavel Labath via lldb-commits

labath wrote:

Thanks.

https://github.com/llvm/llvm-project/pull/104896
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][NFC] Moved the SharedSocket class to Socket.* (PR #104787)

2024-08-26 Thread Pavel Labath via lldb-commits

https://github.com/labath approved this pull request.


https://github.com/llvm/llvm-project/pull/104787
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][dotest] Add an arg to add DLL search paths. (PR #104514)

2024-08-26 Thread Pavel Labath via lldb-commits

labath wrote:

I think that having fewer knobs that one needs to remember to set to correct 
values is better. Sometimes that's unavoidable, but I wanted to make sure this 
is one of those cases. Embedding this information into the swig bindings sounds 
like a good idea.

https://github.com/llvm/llvm-project/pull/104514
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-16 Thread Pavel Labath via lldb-commits


@@ -666,7 +756,23 @@ ConnectionStatus
 ConnectionFileDescriptor::ConnectFD(llvm::StringRef s,
 socket_id_callback_type socket_id_callback,
 Status *error_ptr) {
-#if LLDB_ENABLE_POSIX
+#ifdef _WIN32
+  int64_t fd = -1;
+  if (!s.getAsInteger(0, fd)) {
+// Assume we own fd.
+std::unique_ptr tcp_socket =
+std::make_unique((NativeSocket)fd, true, false);
+m_io_sp = std::move(tcp_socket);
+m_uri = s.str();
+return eConnectionStatusSuccess;
+  }

labath wrote:

I do. I don't think it's a good state if half of our "fd" refer to file 
descriptors and others to handles. (In reality, it's not half/half, but more 
like 90/10, and this is the minority case/)

https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-16 Thread Pavel Labath via lldb-commits


@@ -20,6 +20,10 @@
 #include 
 #include 
 
+#ifdef SendMessage
+#undef SendMessage
+#endif
+

labath wrote:

If it's just due to header moves that's fine.

It's definitely possible to isolate the windows headers. Lllvm does that by 
making sure it only includes the system headers in implementation files (of the 
library wrapping them). It'd be nice to have lldb do the same thing, but we're 
pretty far off.

https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-16 Thread Pavel Labath via lldb-commits

https://github.com/labath edited 
https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-16 Thread Pavel Labath via lldb-commits


@@ -601,15 +570,24 @@ int main_platform(int argc, char *argv[]) {
   // If not running as a server, this process will not accept
   // connections while a connection is active.
   acceptor_up.reset();
-
-  // When not running in server mode, use all available ports
-  platform.SetPortMap(std::move(gdbserver_portmap));
 }
 
 platform.SetConnection(std::unique_ptr(conn));
 client_handle(platform, inferior_arguments);
   } while (g_server);
 
+  // FIXME: Make TCPSocket::CloseListenSockets() public and implement
+  // Acceptor::Close().
+  /*
+  if (acceptor_gdb && gdb_thread.IsJoinable()) {
+g_listen_gdb = false;
+static_cast(acceptor_gdb->m_listener_socket_up.get())
+->CloseListenSockets();
+lldb::thread_result_t result;
+gdb_thread.Join(&result);
+  }
+  */
+

labath wrote:

It's not just linux, that kind of pattern is racy on all posix operating 
systems. It's possible that other OSs will not hang in this situation, but that 
doesn't mean that usage is safe. Even on windows, that can cause you to call 
(WSA)accept on a dangling socket descriptor -- since you can never be sure 
whether the other thread has already called the function, or is it merely about 
to do that. It's less likely to blow up in your face because windows does not 
recycle FDs like posix systems do, but I wouldn't recommend that 
anywhere.

Exiting the main thread may be convenient, but I definitely wouldn't call it 
"safe", as that means the other thread may be running, executing random code 
and memory, while the main thread is shutting everything down and deleting that 
memory. It may not matter often, as the process is done anyway, but if anyone 
was observing the exit code, he may be suprised to see a SEGV instead of a 
clean exit.

Reasoning about a singlethreaded application is much easier than for a 
multithreaded one. We should already have all of the pieces to do this the 
right way, so let's just do that.

https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-16 Thread Pavel Labath via lldb-commits


@@ -47,111 +48,10 @@ using namespace llvm;
 
 // option descriptors for getopt_long_only()
 
-#ifdef _WIN32
-typedef pipe_t shared_fd_t;
-const shared_fd_t kInvalidSharedFD = LLDB_INVALID_PIPE;
-#else
-typedef NativeSocket shared_fd_t;
-const shared_fd_t kInvalidSharedFD = Socket::kInvalidSocketValue;
-#endif
-
-class SharedSocket {
-public:
-  SharedSocket(Connection *conn, Status &error) {
-m_fd = kInvalidSharedFD;
-
-const Socket *socket =
-static_cast(conn->GetReadObject().get());
-if (socket == nullptr) {
-  error = Status("invalid conn socket");
-  return;
-}
-
-#ifdef _WIN32
-m_socket = socket->GetNativeSocket();
-
-// Create a pipe to transfer WSAPROTOCOL_INFO to the child process.
-error = m_socket_pipe.CreateNew(true);
-if (error.Fail())
-  return;
-
-m_fd = m_socket_pipe.GetReadPipe();
-#else
-m_fd = socket->GetNativeSocket();
-error = Status();
-#endif
-  }
-
-  shared_fd_t GetSendableFD() { return m_fd; }
-
-  Status CompleteSending(lldb::pid_t child_pid) {
-#ifdef _WIN32
-// Transfer WSAPROTOCOL_INFO to the child process.
-m_socket_pipe.CloseReadFileDescriptor();
-
-WSAPROTOCOL_INFO protocol_info;
-if (::WSADuplicateSocket(m_socket, child_pid, &protocol_info) ==
-SOCKET_ERROR) {
-  int last_error = ::WSAGetLastError();
-  return Status("WSADuplicateSocket() failed, error: %d", last_error);
-}
-
-size_t num_bytes;
-Status error =
-m_socket_pipe.WriteWithTimeout(&protocol_info, sizeof(protocol_info),
-   std::chrono::seconds(10), num_bytes);
-if (error.Fail())
-  return error;
-if (num_bytes != sizeof(protocol_info))
-  return Status("WriteWithTimeout(WSAPROTOCOL_INFO) failed: %d bytes",
-num_bytes);
-#endif
-return Status();
-  }
-
-  static Status GetNativeSocket(shared_fd_t fd, NativeSocket &socket) {
-#ifdef _WIN32
-socket = Socket::kInvalidSocketValue;
-// Read WSAPROTOCOL_INFO from the parent process and create NativeSocket.
-WSAPROTOCOL_INFO protocol_info;
-{
-  Pipe socket_pipe(fd, LLDB_INVALID_PIPE);
-  size_t num_bytes;
-  Status error =
-  socket_pipe.ReadWithTimeout(&protocol_info, sizeof(protocol_info),
-  std::chrono::seconds(10), num_bytes);
-  if (error.Fail())
-return error;
-  if (num_bytes != sizeof(protocol_info)) {
-return Status(
-"socket_pipe.ReadWithTimeout(WSAPROTOCOL_INFO) failed: % d bytes",
-num_bytes);
-  }
-}
-socket = ::WSASocket(FROM_PROTOCOL_INFO, FROM_PROTOCOL_INFO,
- FROM_PROTOCOL_INFO, &protocol_info, 0, 0);
-if (socket == INVALID_SOCKET) {
-  return Status("WSASocket(FROM_PROTOCOL_INFO) failed: error %d",
-::WSAGetLastError());
-}
-return Status();
-#else
-socket = fd;
-return Status();
-#endif
-  }
-
-private:
-#ifdef _WIN32
-  Pipe m_socket_pipe;
-  NativeSocket m_socket;
-#endif
-  shared_fd_t m_fd;
-};
-
 static int g_debug = 0;
 static int g_verbose = 0;
 static int g_server = 0;
+static volatile bool g_listen_gdb = true;

labath wrote:

yeah, but volatile still isn't the right choice for that. It would have to be 
an atomic var.

https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][NFC] Moved FindSchemeByProtocol() from Acceptor to Socket (PR #104439)

2024-08-16 Thread Pavel Labath via lldb-commits

https://github.com/labath approved this pull request.


https://github.com/llvm/llvm-project/pull/104439
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Add Populate Methods for ELFLinuxPrPsInfo and ELFLinuxPrStatus (PR #104109)

2024-08-16 Thread Pavel Labath via lldb-commits

https://github.com/labath approved this pull request.


https://github.com/llvm/llvm-project/pull/104109
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb][dotest] Add an arg to add DLL search paths. (PR #104514)

2024-08-16 Thread Pavel Labath via lldb-commits

labath wrote:

Can those swift libraries be found automatically? For example if they're at 
some known path relative to lldb or the swift compiler?

https://github.com/llvm/llvm-project/pull/104514
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [LLDB][Minidump] Fix ProcessMinidump::GetMemoryRegions to include 64b regions when /proc/pid maps are missing. (PR #101086)

2024-08-16 Thread Pavel Labath via lldb-commits


@@ -457,34 +456,24 @@ MinidumpParser::FindMemoryRange(lldb::addr_t addr) {
 }
   }
 
-  // Some Minidumps have a Memory64ListStream that captures all the heap memory
-  // (full-memory Minidumps).  We can't exactly use the same loop as above,
-  // because the Minidump uses slightly different data structures to describe
-  // those
-
-  if (!data64.empty()) {
-llvm::ArrayRef memory64_list;
-uint64_t base_rva;
-std::tie(memory64_list, base_rva) =
-MinidumpMemoryDescriptor64::ParseMemory64List(data64);
-
-if (memory64_list.empty())
-  return std::nullopt;
-
-for (const auto &memory_desc64 : memory64_list) {
-  const lldb::addr_t range_start = memory_desc64.start_of_memory_range;
-  const size_t range_size = memory_desc64.data_size;
-
-  if (base_rva + range_size > GetData().size())
-return std::nullopt;
-
-  if (range_start <= addr && addr < range_start + range_size) {
-return minidump::Range(range_start,
-   GetData().slice(base_rva, range_size));
+  if (!GetStream(StreamType::Memory64List).empty()) {
+llvm::Error err = llvm::Error::success();
+for (const auto &memory_desc :  GetMinidumpFile().getMemory64List(err)) {
+  // Explicit error check so we can return from within

labath wrote:

Are you sure that's necessary? If I read 
[this](https://github.com/llvm/llvm-project/blob/4c77cc634d49782aceff77f7ec4e6183ec223020/llvm/include/llvm/ADT/fallible_iterator.h#L66)
 right, it shouldn't be.

https://github.com/llvm/llvm-project/pull/101086
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [LLDB][Minidump] Fix ProcessMinidump::GetMemoryRegions to include 64b regions when /proc/pid maps are missing. (PR #101086)

2024-08-16 Thread Pavel Labath via lldb-commits


@@ -553,53 +547,49 @@ static bool
 CreateRegionsCacheFromMemoryList(MinidumpParser &parser,
  std::vector ®ions) {
   Log *log = GetLog(LLDBLog::Modules);
+  // Cache the expected memory32 into an optional
+  // because double checking the expected triggers the unchecked warning.
+  std::optional> memory32_list;
   auto ExpectedMemory = parser.GetMinidumpFile().getMemoryList();
   if (!ExpectedMemory) {
 LLDB_LOG_ERROR(log, ExpectedMemory.takeError(),
"Failed to read memory list: {0}");

labath wrote:

I don't get the double-checked comment. Something like
```
  if (auto ExpectedMemory = parser.GetMinidumpFile().getMemoryList(); 
!ExpectedMemory) {
LLDB_LOG_ERROR(log, ExpectedMemory.takeError(),
   "Failed to read memory list: {0}");
  } else {
for (const MemoryDescriptor &memory_desc : *ExpectedMemory) {
  ...
```

ought to work just fine (other ways to structure this are also possible, but I 
don't know if you can do an early exit or whatever...). Once you check the 
success state of the expected object and find that it succeeded, the check 
requirement is satisfied. No further checking is needed. That would be 
different if you tried to copy the value into some other (expected) object, as 
then you would need to check that one as well, but the solution to that is 
simple -- just don't do that. :)

https://github.com/llvm/llvm-project/pull/101086
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [LLDB][Minidump] Fix ProcessMinidump::GetMemoryRegions to include 64b regions when /proc/pid maps are missing. (PR #101086)

2024-08-16 Thread Pavel Labath via lldb-commits

https://github.com/labath commented:

A couple of small things, but I think it's fine overall. I'm going to be OOO 
next week, no need to wait for me.

https://github.com/llvm/llvm-project/pull/101086
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [LLDB][Minidump] Fix ProcessMinidump::GetMemoryRegions to include 64b regions when /proc/pid maps are missing. (PR #101086)

2024-08-16 Thread Pavel Labath via lldb-commits


@@ -457,34 +456,24 @@ MinidumpParser::FindMemoryRange(lldb::addr_t addr) {
 }
   }
 
-  // Some Minidumps have a Memory64ListStream that captures all the heap memory
-  // (full-memory Minidumps).  We can't exactly use the same loop as above,
-  // because the Minidump uses slightly different data structures to describe
-  // those
-
-  if (!data64.empty()) {
-llvm::ArrayRef memory64_list;
-uint64_t base_rva;
-std::tie(memory64_list, base_rva) =
-MinidumpMemoryDescriptor64::ParseMemory64List(data64);
-
-if (memory64_list.empty())
-  return std::nullopt;
-
-for (const auto &memory_desc64 : memory64_list) {
-  const lldb::addr_t range_start = memory_desc64.start_of_memory_range;
-  const size_t range_size = memory_desc64.data_size;
-
-  if (base_rva + range_size > GetData().size())
-return std::nullopt;
-
-  if (range_start <= addr && addr < range_start + range_size) {
-return minidump::Range(range_start,
-   GetData().slice(base_rva, range_size));
+  if (!GetStream(StreamType::Memory64List).empty()) {
+llvm::Error err = llvm::Error::success();
+for (const auto &memory_desc :  GetMinidumpFile().getMemory64List(err)) {
+  // Explicit error check so we can return from within
+  if (memory_desc.first.StartOfMemoryRange <= addr 
+  && addr < memory_desc.first.StartOfMemoryRange + 
memory_desc.first.DataSize 
+  && !err) {
+return minidump::Range(memory_desc.first.StartOfMemoryRange, 
memory_desc.second);
   }
-  base_rva += range_size;
 }
+
+if (err)
+  // Without std::move(err) fails with 
+  // error: call to deleted constructor of '::llvm::Error'

labath wrote:

That's just the general property of llvm::Error objects (that they can't be 
copied) and the LLDB_LOG_ERROR macro (that it consumes the error). I don't 
think it needs a special comment.

https://github.com/llvm/llvm-project/pull/101086
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [LLDB][Minidump] Fix ProcessMinidump::GetMemoryRegions to include 64b regions when /proc/pid maps are missing. (PR #101086)

2024-08-16 Thread Pavel Labath via lldb-commits

https://github.com/labath edited 
https://github.com/llvm/llvm-project/pull/101086
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] Disable ThreadPlanSingleThreadTimeout during step over breakpoint (PR #104532)

2024-08-16 Thread Pavel Labath via lldb-commits

labath wrote:

Thanks for looking into this. I'll also defer to Jim, but I'll note two things:
- stepping over a breakpoint can still block -- if the breakpoint is on a 
syscall instruction, and the syscall blocks. However, problem with missed 
breakpoints is also real, and probably more important than blocked steps. As I 
don't see a way to resume other threads while not risking them missing 
breakpoints, this may still be the right thing to do.
- I was amused by the hedging in "potential race condition". The race is real, 
not potential. The fact that you win the race most of the time doesn't mean it 
doesn't exist. :P

https://github.com/llvm/llvm-project/pull/104532
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] Reland "[lldb] Reland 2402b3213c2f with `/H` to debug the windows build issue (PR #101672)

2024-08-16 Thread Pavel Labath via lldb-commits

labath wrote:

Okay, that sort of makes sense. However, unless you actually want to be able to 
disable each of these plugins independently (which it sounds like you don't), 
then this is a very.. baroque way to achieve the desired effect (to have 
multiple (plugin) classes registered with the plugin manager).

All of that cmake plugin logic is just a very elaborate way to invoke 
`SomeClass::Initialize()`. Plugins are registered through 
`PluginManager::RegisterPlugin`. While the usual case is that the `Initialize` 
function performs a single `RegisterPlugin` call (and pretty much nothing 
else), nothing actually depends or enforces that, and we do have "plugins" 
where a single plugin library registers more than plugin class (e.g. 
`lldbPluginPlatformMacOSX`). I think it would be perfectly reasonable to 
register all of these classes from 
`ScriptInterpreterPythonInterfaces::Initialize` (and maybe even from 
`ScriptInterpreterPython::Initialize`).

https://github.com/llvm/llvm-project/pull/101672
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [WIP] memory find speedup+bugfix (PR #104193)

2024-08-15 Thread Pavel Labath via lldb-commits

https://github.com/labath updated 
https://github.com/llvm/llvm-project/pull/104193

>From a537a48c444e9dec3a85241d9726d6f3187a43cf Mon Sep 17 00:00:00 2001
From: Pavel Labath 
Date: Wed, 14 Aug 2024 19:58:27 +0200
Subject: [PATCH] [WIP] memory find speedup+bugfix

---
 lldb/source/Target/Process.cpp| 58 ++---
 .../memory/find/TestMemoryFind.py | 10 +++
 .../API/functionalities/memory/find/main.cpp  | 82 +--
 3 files changed, 108 insertions(+), 42 deletions(-)

diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index e3c4f2ee398cc4..4e3d7651a066ec 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -114,33 +114,6 @@ class ProcessOptionValueProperties
   }
 };
 
-class ProcessMemoryIterator {
-public:
-  ProcessMemoryIterator(Process &process, lldb::addr_t base)
-  : m_process(process), m_base_addr(base) {}
-
-  bool IsValid() { return m_is_valid; }
-
-  uint8_t operator[](lldb::addr_t offset) {
-if (!IsValid())
-  return 0;
-
-uint8_t retval = 0;
-Status error;
-if (0 == m_process.ReadMemory(m_base_addr + offset, &retval, 1, error)) {
-  m_is_valid = false;
-  return 0;
-}
-
-return retval;
-  }
-
-private:
-  Process &m_process;
-  const lldb::addr_t m_base_addr;
-  bool m_is_valid = true;
-};
-
 static constexpr OptionEnumValueElement g_follow_fork_mode_values[] = {
 {
 eFollowParent,
@@ -3368,20 +3341,37 @@ lldb::addr_t Process::FindInMemory(lldb::addr_t low, 
lldb::addr_t high,
 return LLDB_INVALID_ADDRESS;
 
   std::vector bad_char_heuristic(256, size);
-  ProcessMemoryIterator iterator(*this, low);
-
   for (size_t idx = 0; idx < size - 1; idx++) {
 decltype(bad_char_heuristic)::size_type bcu_idx = buf[idx];
 bad_char_heuristic[bcu_idx] = size - idx - 1;
   }
-  for (size_t s = 0; s <= (region_size - size);) {
+
+  llvm::SmallVector mem;
+  addr_t mem_pos = low;
+  const size_t read_size = std::max(size, 0x1);
+
+  for (addr_t s = low; s <= (high - size);) {
+if (s + size >= mem.size() + mem_pos) {
+  mem.resize_for_overwrite(read_size);
+  Status error;
+  mem.resize(
+  ReadMemory(s, mem.data(), std::min(mem.size(), high - s), error));
+  mem_pos = s;
+  if (error.Fail()) {
+MemoryRegionInfo info;
+error = GetMemoryRegionInfo(s, info);
+if (error.Fail())
+  return LLDB_INVALID_ADDRESS;
+s = info.GetRange().GetRangeEnd();
+continue;
+  }
+}
 int64_t j = size - 1;
-while (j >= 0 && buf[j] == iterator[s + j])
+while (j >= 0 && buf[j] == mem[s + j - mem_pos])
   j--;
 if (j < 0)
-  return low + s;
-else
-  s += bad_char_heuristic[iterator[s + size - 1]];
+  return s;
+s += bad_char_heuristic[mem[s + size - 1 - mem_pos]];
   }
 
   return LLDB_INVALID_ADDRESS;
diff --git a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py 
b/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
index 09611cc808777d..a606899498b722 100644
--- a/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
+++ b/lldb/test/API/functionalities/memory/find/TestMemoryFind.py
@@ -79,3 +79,13 @@ def test_memory_find(self):
 'memory find -s "nothere" `stringdata` `stringdata+10`',
 substrs=["data not found within the range."],
 )
+
+pagesize = self.frame().FindVariable("pagesize").GetValueAsUnsigned()
+mem_with_holes = 
self.frame().FindVariable("mem_with_holes").GetValueAsUnsigned()
+matches_var = self.frame().FindVariable("matches")
+self.assertEqual(matches_var.GetNumChildren(), 4)
+matches = [f'data found at location: 
{matches_var.GetChildAtIndex(i).GetValueAsUnsigned():#x}' for i in range(4)]
+self.expect(
+'memory find -c 5 -s "needle" `mem_with_holes` 
`mem_with_holes+5*pagesize`',
+substrs=matches + ["no more matches within the range"],
+)
diff --git a/lldb/test/API/functionalities/memory/find/main.cpp 
b/lldb/test/API/functionalities/memory/find/main.cpp
index e3dcfc762ee0f9..e5525e3ca1f73f 100644
--- a/lldb/test/API/functionalities/memory/find/main.cpp
+++ b/lldb/test/API/functionalities/memory/find/main.cpp
@@ -1,9 +1,75 @@
-#include 
-#include 
-
-int main (int argc, char const *argv[])
-{
-const char* stringdata = "hello world; I like to write text in const char 
pointers";
-uint8_t bytedata[] = 
{0xAA,0xBB,0xCC,0xDD,0xEE,0xFF,0x00,0x11,0x22,0x33,0x44,0x55,0x66,0x77,0x88,0x99};
-return 0; // break here
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef _WIN32
+#include "Windows.h"
+
+int getpagesize() {
+  SYSTEM_INFO system_info;
+  GetSystemInfo(&system_info);
+  return system_info.dwPageSize;
+}
+
+void *allocate_memory_with_holes() {
+  int pagesize = getpagesize();
+  void *mem = VirtualAlloc(nullptr, 5 * pagesize, MEM_RESERVE, PAGE_NOACCESS);
+  if (!mem) {
+

[Lldb-commits] [lldb] [lldb] Add 'FindInMemory()' overload for PostMortemProcess. (PR #102536)

2024-08-15 Thread Pavel Labath via lldb-commits

labath wrote:

Thanks for chiming in Jason. I'm glad this is getting attention, as I think 
that our memory reading APIs are in need of an overhaul -- and it gives me a 
chance to share my idea. ;)

I don't think that we need to change all of the APIs to return their memory. 
Sometimes reading the memory into a caller-provided buffer makes perfect sense. 
What I would like to do is to regularize all the APIs, and avoid the need to 
implement the logic of e.g. "reading a c string twice". The approach I'd like 
is to define a `MemoryReader` interface, which would contain all of the ways 
one may want to read a memory (as a string, integer, etc..). It would also 
contain a default implementation of the APIs so that an implementation would 
only need to implement one or two methods and have everything fall out 
naturally. Then, anything that wants to support that interface, can just 
inherit from it and implement the required method. If something wants to 
provide two different methods for reading the memory (I believe Process 
supports cached and uncached reads), it can just do two implementations, and 
provide them via different accessors (e.g. 
`process->CachedMemory()->ReadInt(addr)` and 
process->DirectMemory()->ReadInt(addr)`).

Another nice part about this is that if we have some code, which only needs to 
read memory, then it can take a MemoryReader reference (instead of Process or 
whatever), and then it can be unit tested by just passing it a mock memory 
reader.

https://github.com/llvm/llvm-project/pull/102536
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Realpath symlinks for breakpoints (PR #102223)

2024-08-15 Thread Pavel Labath via lldb-commits

https://github.com/labath commented:

The deps seem fine. I like the approach you've chosen.

I'll leave the rest to other reviewers.

https://github.com/llvm/llvm-project/pull/102223
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] Reland "[lldb] Reland 2402b3213c2f with `/H` to debug the windows build issue (PR #101672)

2024-08-15 Thread Pavel Labath via lldb-commits

labath wrote:

`lldbPluginScriptInterpreterPythonScriptedThreadPlanPythonInterface` quite a 
mouthful for just a single cpp file. Maybe all of the interfaces could be a 
part of a single library?

That would end up with something like:
```
D:\build-lldb\tools\lldb\source\Plugins\ScriptInterpreter\Python\Interfaces\CMakeFiles\lldbPluginScriptInterpreterPythonInterfaces.dir\lldbPluginScriptInterpreterPythonInterfaces.pdb
```

.. which is 80 chars shorter?

https://github.com/llvm/llvm-project/pull/101672
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-15 Thread Pavel Labath via lldb-commits

https://github.com/labath edited 
https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-15 Thread Pavel Labath via lldb-commits


@@ -47,111 +48,10 @@ using namespace llvm;
 
 // option descriptors for getopt_long_only()
 
-#ifdef _WIN32
-typedef pipe_t shared_fd_t;
-const shared_fd_t kInvalidSharedFD = LLDB_INVALID_PIPE;
-#else
-typedef NativeSocket shared_fd_t;
-const shared_fd_t kInvalidSharedFD = Socket::kInvalidSocketValue;
-#endif
-
-class SharedSocket {
-public:
-  SharedSocket(Connection *conn, Status &error) {
-m_fd = kInvalidSharedFD;
-
-const Socket *socket =
-static_cast(conn->GetReadObject().get());
-if (socket == nullptr) {
-  error = Status("invalid conn socket");
-  return;
-}
-
-#ifdef _WIN32
-m_socket = socket->GetNativeSocket();
-
-// Create a pipe to transfer WSAPROTOCOL_INFO to the child process.
-error = m_socket_pipe.CreateNew(true);
-if (error.Fail())
-  return;
-
-m_fd = m_socket_pipe.GetReadPipe();
-#else
-m_fd = socket->GetNativeSocket();
-error = Status();
-#endif
-  }
-
-  shared_fd_t GetSendableFD() { return m_fd; }
-
-  Status CompleteSending(lldb::pid_t child_pid) {
-#ifdef _WIN32
-// Transfer WSAPROTOCOL_INFO to the child process.
-m_socket_pipe.CloseReadFileDescriptor();
-
-WSAPROTOCOL_INFO protocol_info;
-if (::WSADuplicateSocket(m_socket, child_pid, &protocol_info) ==
-SOCKET_ERROR) {
-  int last_error = ::WSAGetLastError();
-  return Status("WSADuplicateSocket() failed, error: %d", last_error);
-}
-
-size_t num_bytes;
-Status error =
-m_socket_pipe.WriteWithTimeout(&protocol_info, sizeof(protocol_info),
-   std::chrono::seconds(10), num_bytes);
-if (error.Fail())
-  return error;
-if (num_bytes != sizeof(protocol_info))
-  return Status("WriteWithTimeout(WSAPROTOCOL_INFO) failed: %d bytes",
-num_bytes);
-#endif
-return Status();
-  }
-
-  static Status GetNativeSocket(shared_fd_t fd, NativeSocket &socket) {
-#ifdef _WIN32
-socket = Socket::kInvalidSocketValue;
-// Read WSAPROTOCOL_INFO from the parent process and create NativeSocket.
-WSAPROTOCOL_INFO protocol_info;
-{
-  Pipe socket_pipe(fd, LLDB_INVALID_PIPE);
-  size_t num_bytes;
-  Status error =
-  socket_pipe.ReadWithTimeout(&protocol_info, sizeof(protocol_info),
-  std::chrono::seconds(10), num_bytes);
-  if (error.Fail())
-return error;
-  if (num_bytes != sizeof(protocol_info)) {
-return Status(
-"socket_pipe.ReadWithTimeout(WSAPROTOCOL_INFO) failed: % d bytes",
-num_bytes);
-  }
-}
-socket = ::WSASocket(FROM_PROTOCOL_INFO, FROM_PROTOCOL_INFO,
- FROM_PROTOCOL_INFO, &protocol_info, 0, 0);
-if (socket == INVALID_SOCKET) {
-  return Status("WSASocket(FROM_PROTOCOL_INFO) failed: error %d",
-::WSAGetLastError());
-}
-return Status();
-#else
-socket = fd;
-return Status();
-#endif
-  }
-
-private:
-#ifdef _WIN32
-  Pipe m_socket_pipe;
-  NativeSocket m_socket;
-#endif
-  shared_fd_t m_fd;
-};
-
 static int g_debug = 0;
 static int g_verbose = 0;
 static int g_server = 0;
+static volatile bool g_listen_gdb = true;

labath wrote:

volatile? really?

https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-15 Thread Pavel Labath via lldb-commits


@@ -879,20 +879,12 @@ lldb::thread_result_t 
GDBRemoteCommunication::ListenThread() {
   return {};
 }
 
-Status GDBRemoteCommunication::StartDebugserverProcess(
-const char *url, Platform *platform, ProcessLaunchInfo &launch_info,
-uint16_t *port, const Args *inferior_args, int pass_comm_fd) {
+bool GDBRemoteCommunication::GetDebugserverPath(
+Platform *platform, FileSpec &debugserver_file_spec) {

labath wrote:

```suggestion
FileSpec GDBRemoteCommunication::GetDebugserverPath(
Platform *platform) {
```

just return an invalid FileSpec for error.

https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-15 Thread Pavel Labath via lldb-commits


@@ -666,7 +756,23 @@ ConnectionStatus
 ConnectionFileDescriptor::ConnectFD(llvm::StringRef s,
 socket_id_callback_type socket_id_callback,
 Status *error_ptr) {
-#if LLDB_ENABLE_POSIX
+#ifdef _WIN32
+  int64_t fd = -1;
+  if (!s.getAsInteger(0, fd)) {
+// Assume we own fd.
+std::unique_ptr tcp_socket =
+std::make_unique((NativeSocket)fd, true, false);
+m_io_sp = std::move(tcp_socket);
+m_uri = s.str();
+return eConnectionStatusSuccess;
+  }

labath wrote:

I don't exactly have an alternative in mind yet, but I'm not sure this is a 
good idea. File descriptors are a thing on windows as well (though, IIUC, 
they're more of like an in-process thing rather than a kernel entity), and we 
do have code which treats them as such, so I think it could be confusing that 
this functions considers an "fd" to be a socket handle.

https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-15 Thread Pavel Labath via lldb-commits


@@ -40,83 +40,20 @@
 
 #include "lldb/Utility/StringExtractorGDBRemote.h"
 
+#include "../tools/lldb-server/Acceptor.h"

labath wrote:

We will need to move this someplace else (probably `Host`)

https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-15 Thread Pavel Labath via lldb-commits


@@ -601,15 +570,24 @@ int main_platform(int argc, char *argv[]) {
   // If not running as a server, this process will not accept
   // connections while a connection is active.
   acceptor_up.reset();
-
-  // When not running in server mode, use all available ports
-  platform.SetPortMap(std::move(gdbserver_portmap));
 }
 
 platform.SetConnection(std::unique_ptr(conn));
 client_handle(platform, inferior_arguments);
   } while (g_server);
 
+  // FIXME: Make TCPSocket::CloseListenSockets() public and implement
+  // Acceptor::Close().
+  /*
+  if (acceptor_gdb && gdb_thread.IsJoinable()) {
+g_listen_gdb = false;
+static_cast(acceptor_gdb->m_listener_socket_up.get())
+->CloseListenSockets();
+lldb::thread_result_t result;
+gdb_thread.Join(&result);
+  }
+  */
+

labath wrote:

This won't work. This is exactly the pattern from that kernel bug we were 
arguing over. It's just not possible to safely close an fd while another thread 
may be operating on it.

You need to use an out-of-band mechanism (e.g. a pipe) to notify the thread 
that it's time to terminate. However, since that will involve some form of 
multiplexing (e.g. `select(2)`). I would strongly prefer to just multiplex over 
the two sockets we are accepting. We should be able to use the `MainLoop` class 
for that -- just register the two sockets with the class, and let it call the 
right callback when needed. Basically, the code should look similar to 
[this](https://github.com/llvm/llvm-project/blob/f71b63865140cf3c286baf3a77ba3e467f929504/lldb/source/Host/common/TCPSocket.cpp#L267),
 except that it will be listening to two unrelated sockets.

https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-15 Thread Pavel Labath via lldb-commits


@@ -20,6 +20,10 @@
 #include 
 #include 
 
+#ifdef SendMessage
+#undef SendMessage
+#endif
+

labath wrote:

I guess that means we're including a windows header somewhere. Any chance to 
avoid that? Llvm is pretty strict about that. Lldb is not, though I think it's 
a good practice.

https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-15 Thread Pavel Labath via lldb-commits

https://github.com/labath edited 
https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Removed gdbserver ports map from lldb-server (PR #104238)

2024-08-15 Thread Pavel Labath via lldb-commits

https://github.com/labath commented:

The termination part definitely needs to be done differently, but overall, I 
think this is a pretty good start.

(PSA: I will be OOO all of next week.)

https://github.com/llvm/llvm-project/pull/104238
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] Reapply "[lldb] Tolerate multiple compile units with the same DWO ID (#100577)" (PR #104041)

2024-08-15 Thread Pavel Labath via lldb-commits

https://github.com/labath closed 
https://github.com/llvm/llvm-project/pull/104041
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] Fix single thread stepping timeout race condition (PR #104195)

2024-08-15 Thread Pavel Labath via lldb-commits

https://github.com/labath approved this pull request.

This indeed fixes the race condition. I can confirm that the test no longer 
hangs after with this patch. I'm glad we were able to figure this out.

*However*, this doesn't appear to be the only problem here. With the hang out 
of the way, I can now see another failure mode, which appears with a lower (say 
~0.1%) frequency.

Judging by the [logs](https://paste.debian.net/hidden/30235a5c/), the problem 
is triggered when the interrupt packet (^C) arrives too soon -- before the 
inferior has had a chance to execute anything. I haven't looked at the code, 
but my guess would be that the step-over thread plan erroneously decides that 
it's done (because it's back at the function it wanted to be in, but it does 
not realize that's because the PC hasn't been moved).

If you want to reproduce this yourself, I've found that inserting a 10ms sleep 
[here](https://github.com/llvm/llvm-project/blob/7898866065f6c9b72b5fa3e45e565baf8a5e7609/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp#L1005)
 makes the failure reproduce in about 50% of cases (as it increases the 
likelyhood that the interrupt will arrive "too soon" -- you could probably 
achieve the same thing by decreasing the single thread timeout) 

https://github.com/llvm/llvm-project/pull/104195
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Add Populate Methods for ELFLinuxPrPsInfo and ELFLinuxPrStatus (PR #104109)

2024-08-14 Thread Pavel Labath via lldb-commits


@@ -0,0 +1,162 @@
+//===-- ThreadElfCoreTest.cpp *- C++
+//-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+#include "Plugins/Process/elf-core/ThreadElfCore.h"
+#include "Plugins/Platform/Linux/PlatformLinux.h"
+#include "TestingSupport/TestUtilities.h"
+#include "lldb/Core/Debugger.h"
+#include "lldb/Host/FileSystem.h"
+#include "lldb/Host/HostInfo.h"
+#include "lldb/Target/Process.h"
+#include "lldb/Target/Target.h"
+#include "lldb/Target/Thread.h"
+#include "lldb/Utility/Listener.h"
+#include "llvm/TargetParser/Triple.h"
+#include "gtest/gtest.h"
+
+#include 
+#include 
+#include 
+#include 
+
+using namespace lldb_private;
+
+namespace {
+
+struct ElfCoreTest : public testing::Test {
+  static void SetUpTestCase() {
+FileSystem::Initialize();
+HostInfo::Initialize();
+platform_linux::PlatformLinux::Initialize();
+std::call_once(TestUtilities::g_debugger_initialize_flag,
+   []() { Debugger::Initialize(nullptr); });
+  }
+  static void TearDownTestCase() {
+platform_linux::PlatformLinux::Terminate();
+HostInfo::Terminate();
+FileSystem::Terminate();
+  }
+};
+
+struct DummyProcess : public Process {
+  DummyProcess(lldb::TargetSP target_sp, lldb::ListenerSP listener_sp)
+  : Process(target_sp, listener_sp) {
+SetID(getpid());
+  }
+
+  bool CanDebug(lldb::TargetSP target, bool plugin_specified_by_name) override 
{
+return true;
+  }
+  Status DoDestroy() override { return {}; }
+  void RefreshStateAfterStop() override {}
+  size_t DoReadMemory(lldb::addr_t vm_addr, void *buf, size_t size,
+  Status &error) override {
+return 0;
+  }
+  bool DoUpdateThreadList(ThreadList &old_thread_list,
+  ThreadList &new_thread_list) override {
+return false;
+  }
+  llvm::StringRef GetPluginName() override { return "Dummy"; }
+};
+
+struct DummyThread : public Thread {
+  using Thread::Thread;
+
+  ~DummyThread() override { DestroyThread(); }
+
+  void RefreshStateAfterStop() override {}
+
+  lldb::RegisterContextSP GetRegisterContext() override { return nullptr; }
+
+  lldb::RegisterContextSP
+  CreateRegisterContextForFrame(StackFrame *frame) override {
+return nullptr;
+  }
+
+  bool CalculateStopInfo() override { return false; }
+};
+
+lldb::TargetSP CreateTarget(lldb::DebuggerSP &debugger_sp, ArchSpec &arch) {
+  lldb::PlatformSP platform_sp;
+  lldb::TargetSP target_sp;
+  debugger_sp->GetTargetList().CreateTarget(
+  *debugger_sp, "", arch, eLoadDependentsNo, platform_sp, target_sp);
+  return target_sp;
+}
+
+lldb::ThreadSP CreateThread(lldb::ProcessSP &process_sp) {
+  lldb::ThreadSP thread_sp =
+  std::make_shared(*process_sp.get(), gettid());
+  if (thread_sp == nullptr) {
+return nullptr;
+  }
+  process_sp->GetThreadList().AddThread(thread_sp);
+
+  return thread_sp;
+}
+
+} // namespace
+
+TEST_F(ElfCoreTest, PopulatePrpsInfoTest) {
+  ArchSpec arch{HostInfo::GetTargetTriple()};
+  lldb::DebuggerSP debugger_sp = Debugger::CreateInstance();
+  ASSERT_TRUE(debugger_sp);
+
+  lldb::TargetSP target_sp = CreateTarget(debugger_sp, arch);
+  ASSERT_TRUE(target_sp);
+
+  lldb::ListenerSP listener_sp(Listener::MakeListener("dummy"));
+  lldb::ProcessSP process_sp =
+  std::make_shared(target_sp, listener_sp);
+  ASSERT_TRUE(process_sp);
+  auto prpsinfo_opt = ELFLinuxPrPsInfo::Populate(process_sp);

labath wrote:

This would be easier to test if the API was like 
`ELFLinuxPrPsInfo::Populate(ProcessInfo, State)`, as then you could call the 
function with values fully under your control. That way you could e.g. test 
boundary conditions on the arguments string (the most complicated part of the 
function).

If you want you can still have an overload which takes a process (and calls 
this function), but given that it will likely only ever have a single caller, 
it could just be inlined into it.

https://github.com/llvm/llvm-project/pull/104109
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Add Populate Methods for ELFLinuxPrPsInfo and ELFLinuxPrStatus (PR #104109)

2024-08-14 Thread Pavel Labath via lldb-commits


@@ -394,6 +420,114 @@ Status ELFLinuxPrPsInfo::Parse(const DataExtractor &data,
   return error;
 }
 
+std::optional
+ELFLinuxPrPsInfo::Populate(const lldb::ProcessSP &process_sp) {
+  ELFLinuxPrPsInfo prpsinfo{};
+  prpsinfo.pr_pid = process_sp->GetID();
+  ProcessInstanceInfo info;
+  if (!process_sp->GetProcessInfo(info)) {
+return std::nullopt;
+  }
+  prpsinfo.pr_nice = info.GetPriorityValue().value_or(0);
+  prpsinfo.pr_zomb = 0;
+  if (auto zombie_opt = info.IsZombie(); zombie_opt.value_or(false)) {
+prpsinfo.pr_zomb = 1;
+  }
+  /**
+   * In the linux kernel this comes from:
+   * state = READ_ONCE(p->__state);
+   * i = state ? ffz(~state) + 1 : 0;
+   * psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i];
+   *
+   * So we replicate that here. From proc_pid_stats(5)
+   * R = Running
+   * S = Sleeping on uninterrutible wait
+   * D = Waiting on uninterruptable disk sleep
+   * T = Tracing stop
+   * Z = Zombie
+   * W = Paging
+   */
+  lldb::StateType process_state = process_sp->GetState();
+  switch (process_state) {
+  case lldb::StateType::eStateSuspended:
+prpsinfo.pr_sname = 'S';
+prpsinfo.pr_state = 1;
+break;
+  case lldb::StateType::eStateStopped:
+[[fallthrough]];
+  case lldb::StateType::eStateStepping:
+prpsinfo.pr_sname = 'T';
+prpsinfo.pr_state = 3;
+break;
+  case lldb::StateType::eStateUnloaded:
+[[fallthrough]];
+  case lldb::StateType::eStateRunning:
+prpsinfo.pr_sname = 'R';
+prpsinfo.pr_state = 0;
+break;
+  default:
+break;
+  }
+
+  /**
+   * pr_flags is left as 0. The values (in linux) are specific
+   * to the kernel. We recover them from the proc filesystem
+   * but don't put them in ProcessInfo because it would really
+   * become very linux specific and the utility here seems pretty
+   * dubious
+   */
+
+  if (info.EffectiveUserIDIsValid()) {
+prpsinfo.pr_uid = info.GetUserID();
+  }
+  if (info.EffectiveGroupIDIsValid()) {
+prpsinfo.pr_gid = info.GetGroupID();
+  }
+
+  if (info.ParentProcessIDIsValid()) {
+prpsinfo.pr_ppid = info.GetParentProcessID();
+  }
+
+  if (info.ProcessGroupIDIsValid()) {
+prpsinfo.pr_pgrp = info.GetProcessGroupID();
+  }
+
+  if (info.ProcessSessionIDIsValid()) {
+prpsinfo.pr_sid = info.GetProcessSessionID();
+  }
+  constexpr size_t fname_len = std::extent_v;
+  static_assert(fname_len > 0, "This should always be non zero");
+  const llvm::StringRef fname = info.GetNameAsStringRef();
+  auto fname_begin = fname.begin();
+  std::copy(fname_begin,
+std::next(fname_begin, std::min(fname_len, fname.size())),
+prpsinfo.pr_fname);
+  prpsinfo.pr_fname[fname_len - 1] = '\0';
+  auto args = info.GetArguments();
+  if (!args.empty()) {

labath wrote:

The code does the right thing even without this check.

https://github.com/llvm/llvm-project/pull/104109
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Add Populate Methods for ELFLinuxPrPsInfo and ELFLinuxPrStatus (PR #104109)

2024-08-14 Thread Pavel Labath via lldb-commits


@@ -394,6 +420,114 @@ Status ELFLinuxPrPsInfo::Parse(const DataExtractor &data,
   return error;
 }
 
+std::optional
+ELFLinuxPrPsInfo::Populate(const lldb::ProcessSP &process_sp) {
+  ELFLinuxPrPsInfo prpsinfo{};
+  prpsinfo.pr_pid = process_sp->GetID();
+  ProcessInstanceInfo info;
+  if (!process_sp->GetProcessInfo(info)) {
+return std::nullopt;
+  }
+  prpsinfo.pr_nice = info.GetPriorityValue().value_or(0);
+  prpsinfo.pr_zomb = 0;
+  if (auto zombie_opt = info.IsZombie(); zombie_opt.value_or(false)) {
+prpsinfo.pr_zomb = 1;
+  }
+  /**
+   * In the linux kernel this comes from:
+   * state = READ_ONCE(p->__state);
+   * i = state ? ffz(~state) + 1 : 0;
+   * psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i];
+   *
+   * So we replicate that here. From proc_pid_stats(5)
+   * R = Running
+   * S = Sleeping on uninterrutible wait
+   * D = Waiting on uninterruptable disk sleep
+   * T = Tracing stop
+   * Z = Zombie
+   * W = Paging
+   */
+  lldb::StateType process_state = process_sp->GetState();
+  switch (process_state) {
+  case lldb::StateType::eStateSuspended:
+prpsinfo.pr_sname = 'S';
+prpsinfo.pr_state = 1;
+break;
+  case lldb::StateType::eStateStopped:
+[[fallthrough]];
+  case lldb::StateType::eStateStepping:
+prpsinfo.pr_sname = 'T';
+prpsinfo.pr_state = 3;
+break;
+  case lldb::StateType::eStateUnloaded:
+[[fallthrough]];
+  case lldb::StateType::eStateRunning:
+prpsinfo.pr_sname = 'R';
+prpsinfo.pr_state = 0;
+break;
+  default:
+break;
+  }
+
+  /**
+   * pr_flags is left as 0. The values (in linux) are specific
+   * to the kernel. We recover them from the proc filesystem
+   * but don't put them in ProcessInfo because it would really
+   * become very linux specific and the utility here seems pretty
+   * dubious
+   */
+
+  if (info.EffectiveUserIDIsValid()) {
+prpsinfo.pr_uid = info.GetUserID();
+  }
+  if (info.EffectiveGroupIDIsValid()) {
+prpsinfo.pr_gid = info.GetGroupID();
+  }
+
+  if (info.ParentProcessIDIsValid()) {
+prpsinfo.pr_ppid = info.GetParentProcessID();
+  }
+
+  if (info.ProcessGroupIDIsValid()) {
+prpsinfo.pr_pgrp = info.GetProcessGroupID();
+  }
+
+  if (info.ProcessSessionIDIsValid()) {
+prpsinfo.pr_sid = info.GetProcessSessionID();
+  }
+  constexpr size_t fname_len = std::extent_v;
+  static_assert(fname_len > 0, "This should always be non zero");
+  const llvm::StringRef fname = info.GetNameAsStringRef();
+  auto fname_begin = fname.begin();
+  std::copy(fname_begin,
+std::next(fname_begin, std::min(fname_len, fname.size())),
+prpsinfo.pr_fname);
+  prpsinfo.pr_fname[fname_len - 1] = '\0';
+  auto args = info.GetArguments();
+  if (!args.empty()) {
+constexpr size_t psargs_len = std::extent_v;
+static_assert(psargs_len > 0, "This should always be non zero");
+size_t i = psargs_len;
+auto argentry_iterator = std::begin(args);
+char *psargs = prpsinfo.pr_psargs;
+while (i > 0 && argentry_iterator != args.end()) {
+  llvm::StringRef argentry = argentry_iterator->ref();
+  size_t len = std::min(i, argentry.size());
+  auto arg_iterator = std::begin(argentry);
+  std::copy(arg_iterator, std::next(arg_iterator, len), psargs);
+  i -= len;
+  psargs += len;
+  if (i > 0) {
+*(psargs++) = ' ';
+--i;
+  }
+  ++argentry_iterator;
+}

labath wrote:

```suggestion
auto argentry_iterator = std::begin(args);
char *psargs = prpsinfo.pr_psargs;
char *psargs_end = std::end(prpsinfo.pr_psargs);
while (psargs < psargs_end && argentry_iterator != args.end()) {
  llvm::StringRef argentry = argentry_iterator->ref();
  size_t len = std::min(std::distance(psargs, psargs_end), argentry.size());
  auto arg_iterator = std::begin(argentry);
  psargs = std::copy_n(arg_iterator, len, psargs);
  if (psargs != psargs_end)
*(psargs++) = ' ';
  ++argentry_iterator;
}
```

I noticed you like iterators, so I've put in more of them. :)

https://github.com/llvm/llvm-project/pull/104109
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Add Populate Methods for ELFLinuxPrPsInfo and ELFLinuxPrStatus (PR #104109)

2024-08-14 Thread Pavel Labath via lldb-commits


@@ -394,6 +420,114 @@ Status ELFLinuxPrPsInfo::Parse(const DataExtractor &data,
   return error;
 }
 
+std::optional
+ELFLinuxPrPsInfo::Populate(const lldb::ProcessSP &process_sp) {
+  ELFLinuxPrPsInfo prpsinfo{};
+  prpsinfo.pr_pid = process_sp->GetID();
+  ProcessInstanceInfo info;
+  if (!process_sp->GetProcessInfo(info)) {
+return std::nullopt;
+  }
+  prpsinfo.pr_nice = info.GetPriorityValue().value_or(0);
+  prpsinfo.pr_zomb = 0;
+  if (auto zombie_opt = info.IsZombie(); zombie_opt.value_or(false)) {
+prpsinfo.pr_zomb = 1;
+  }
+  /**
+   * In the linux kernel this comes from:
+   * state = READ_ONCE(p->__state);
+   * i = state ? ffz(~state) + 1 : 0;
+   * psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i];
+   *
+   * So we replicate that here. From proc_pid_stats(5)
+   * R = Running
+   * S = Sleeping on uninterrutible wait
+   * D = Waiting on uninterruptable disk sleep
+   * T = Tracing stop
+   * Z = Zombie
+   * W = Paging
+   */
+  lldb::StateType process_state = process_sp->GetState();
+  switch (process_state) {
+  case lldb::StateType::eStateSuspended:
+prpsinfo.pr_sname = 'S';
+prpsinfo.pr_state = 1;
+break;
+  case lldb::StateType::eStateStopped:
+[[fallthrough]];
+  case lldb::StateType::eStateStepping:
+prpsinfo.pr_sname = 'T';
+prpsinfo.pr_state = 3;
+break;
+  case lldb::StateType::eStateUnloaded:
+[[fallthrough]];
+  case lldb::StateType::eStateRunning:
+prpsinfo.pr_sname = 'R';
+prpsinfo.pr_state = 0;
+break;
+  default:
+break;
+  }
+
+  /**
+   * pr_flags is left as 0. The values (in linux) are specific
+   * to the kernel. We recover them from the proc filesystem
+   * but don't put them in ProcessInfo because it would really
+   * become very linux specific and the utility here seems pretty
+   * dubious
+   */
+
+  if (info.EffectiveUserIDIsValid()) {
+prpsinfo.pr_uid = info.GetUserID();
+  }
+  if (info.EffectiveGroupIDIsValid()) {
+prpsinfo.pr_gid = info.GetGroupID();
+  }
+
+  if (info.ParentProcessIDIsValid()) {
+prpsinfo.pr_ppid = info.GetParentProcessID();
+  }
+
+  if (info.ProcessGroupIDIsValid()) {
+prpsinfo.pr_pgrp = info.GetProcessGroupID();
+  }
+
+  if (info.ProcessSessionIDIsValid()) {
+prpsinfo.pr_sid = info.GetProcessSessionID();
+  }
+  constexpr size_t fname_len = std::extent_v;
+  static_assert(fname_len > 0, "This should always be non zero");
+  const llvm::StringRef fname = info.GetNameAsStringRef();
+  auto fname_begin = fname.begin();
+  std::copy(fname_begin,
+std::next(fname_begin, std::min(fname_len, fname.size())),
+prpsinfo.pr_fname);

labath wrote:

```suggestion
  std::copy_n(fname_begin,
std::min(fname_len, fname.size()),
prpsinfo.pr_fname);
```

https://github.com/llvm/llvm-project/pull/104109
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Add Populate Methods for ELFLinuxPrPsInfo and ELFLinuxPrStatus (PR #104109)

2024-08-14 Thread Pavel Labath via lldb-commits

https://github.com/labath commented:

I like how you've made a separate patch out of these functions. Just a couple 
of random improvements I noticed.

https://github.com/llvm/llvm-project/pull/104109
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [lldb] Add Populate Methods for ELFLinuxPrPsInfo and ELFLinuxPrStatus (PR #104109)

2024-08-14 Thread Pavel Labath via lldb-commits

https://github.com/labath edited 
https://github.com/llvm/llvm-project/pull/104109
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


<    2   3   4   5   6   7   8   9   10   11   >