One important thing I forgot to mention in my previous email (although I
thought I had done so) is that I am using LLDB to execute the target in
single-step mode, thus I am already incurring the 1000x slowdown. Given that,
the extra processing comes practically for free.
In addition, while I currently focus on Darwin on x86-64, I would prefer to
make decisions that lead to a cross-{architecture, language, platform}
solution, ideally without affecting the binary.
Regarding your mmap() interception suggestion, I had also considered it, but
thought that it would require a kernel driver for handling the page faults of
the process in order to function properly, since LD_PRELOAD /
DYLD_INSERT_LIBRARIES wouldn’t work for programs that use syscalls directly or
statically link with libc.
I believe that the initial solution, aka using "image lookup" and "memory
region $sp", would better fulfil my current requirements, so I am going to give
that a try.
Last but not least, I would like to mention that I’ve found your insights
extremely helpful and really appreciated your willingness to help me, so thank
you one more time! 😊
― Vangelis
> On 7 Feb 2020, at 19:39, Pavel Labath <[email protected]> wrote:
>
> Thanks for the explanation, Vangelis.
>
> It sounds like binary instrumentation would be the best approach for this, as
> this is pretty much exactly what msan does. If recompilation is not an
> option, then you might be able to get something to work via lldb, but I
> expect this to be _incredibly_ slow (like 1000x, or more). One thing I might
> consider in your place is some kind of a in-process solution. For instance,
> if you intercept mmap (via LD_PRELOAD or something) then you could set it map
> all anonymous memory (aka heap) as read-only. This way you'll get a SIGSEGV
> everytime somebody tries to write to that address. You could intercept that
> signal and do your analysis there. Assuming heap writes are not very common,
> this might even give you a reasonable performance.
>
> But this is not going to be super easy either. The trickiest part here will
> be resuming the program -- you'll need to remap the page read-write, do a
> single step, and then set it to read-only again.
>
> pl
>
> On Fri, 7 Feb 2020 at 01:40, Vangelis Tsiatsianas <[email protected]
> <mailto:[email protected]>> wrote:
> Thank you for your thorough and timely response, Pavel! 🙂
>
> Your suggestions might actually cover completely what I am attempting to
> achieve.
>
> Unfortunately, I am not able to disclose the exact reason I need it, but I
> want to track all heap writes, in order to detect modifications in the heap
> and save both the old and the newly written value.
>
> For now, this translates to tracking common x86 assembly instructions (mov{l,
> w, d, q}) for a single thread ―supporting more “exotic” instructions like
> SIMD, multiple architectures or threads is not currently a goal.
>
> Another method could also be an LLVM instrumentation pass, however I would
> like to avoid recompiling and modifying the binary, thus I focus on LLDB,
> even if I end up missing a few writes that way.
>
> I was initially looking for a more complete, cross-platform solution (see:
> http://lists.llvm.org/pipermail/llvm-dev/2019-November/136876.html
> <http://lists.llvm.org/pipermail/llvm-dev/2019-November/136876.html>), but
> the solution proved to be too time consuming for the timeframe I have
> available for my master’s (ending in March).
>
>
> ― Vangelis
>
>
>> On 7 Feb 2020, at 01:20, Pavel Labath <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> In general, getting this kind of information is pretty hard, so lldb does
>> not offer you an out-of-the-box solution for it, but it does give you tools
>> which you can use to approximate that.
>>
>> If I wanted to do something like this, the first thing I'd try to do is run
>> "image lookup -a 0xaddr". If this doesn't return anything then the address
>> does not correspond to any known module. This rules out code, global
>> variables, and similar. Then you can run through all of the threads and do a
>> "memory region $SP", which will give you bounds of the memory allocation
>> around the stack pointer. If your address is in one of these ranges, then
>> it's a stack address. Otherwise, it's probably heap (though you can never be
>> 100% sure of that).
>>
>> However, it's not fully clear to me what it is that you're trying to do
>> here. Maybe if you explain the higher level problem that you're trying to
>> solve, we can come up with a better solution.
>>
>> pl
>>
>> On Thu, 6 Feb 2020 at 07:40, Vangelis Tsiatsianas via lldb-dev
>> <[email protected] <mailto:[email protected]>> wrote:
>> Hi everyone,
>>
>> I am looking for a way to tell whether a memory address belongs to the heap
>> or not.
>>
>> In other words, I would like to make sure that the address does not reside
>> within any stack frame (even if the stack of the thread has been allocated
>> in the heap) and that it’s not a global variable or instruction.
>>
>> Checking whether it is a valid or correctly allocated address or a
>> memory-mapped file or register is not a goal, so accessing it in order to
>> decide, at the risk of causing a segmentation fault, is an accepted solution.
>>
>> I have been thinking of manually checking the address against the boundaries
>> of each active stack frame, the start and end of the instruction segment and
>> the locations of all global variables.
>>
>> However, I would like to ask where there are better ways to approach this
>> problem in LLDB.
>>
>> Thank you very much, advance! 🙂
>>
>>
>> ― Vangelis
>>
>> _______________________________________________
>> lldb-dev mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>> <https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev>
>
_______________________________________________
lldb-dev mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev