Re: [lldb-dev] LLDB Evolution - Final Form

Mehdi Amini via lldb-dev Tue, 20 Sep 2016 10:48:13 -0700

> On Sep 20, 2016, at 10:33 AM, Greg Clayton <[email protected]> wrote:
> 
>> 
>> On Sep 19, 2016, at 2:46 PM, Mehdi Amini <[email protected]> wrote:
>> 
>>> 
>>> On Sep 19, 2016, at 2:34 PM, Greg Clayton <[email protected]> wrote:
>>> 
>>>> 
>>>> On Sep 19, 2016, at 2:20 PM, Mehdi Amini <[email protected]> wrote:
>>>> 
>>>>> 
>>>>> On Sep 19, 2016, at 2:02 PM, Greg Clayton via lldb-dev 
>>>>> <[email protected]> wrote:
>>>>> 
>>>>> 
>>>>>> On Sep 19, 2016, at 1:18 PM, Zachary Turner via lldb-dev 
>>>>>> <[email protected]> wrote:
>>>>>> 
>>>>>> Following up with Kate's post from a few weeks ago, I think the dust has 
>>>>>> settled on the code reformat and it went over pretty smoothly for the 
>>>>>> most part.  So I thought it might be worth throwing out some ideas for 
>>>>>> where we go from here.  I have a large list of ideas (more ideas than 
>>>>>> time, sadly) that I've been collecting over the past few weeks, so I 
>>>>>> figured I would throw them out in the open for discussion.
>>>>>> 
>>>>>> I’ve grouped the areas for improvement into 3 high level categories.
>>>>>> 
>>>>>>  • De-inventing the wheel - We should use more code from LLVM, and 
>>>>>> delete code in LLDB where LLVM provides a solution.  In cases where 
>>>>>> there is an LLVM thing that is *similar* to what we need, we should 
>>>>>> extend the LLVM thing to support what we need, and then use it.  
>>>>>> Following are some areas I've identified.  This list is by no means 
>>>>>> complete.  For each one, I've given a personal assessment of how likely 
>>>>>> it is to cause some (temporary) hiccups, how much it would help us in 
>>>>>> the long run, and how difficult it would be to do.  Without further ado:
>>>>>>          • Use llvm::Regex instead of lldb::Regex
>>>>>>                  • llvm::Regex doesn’t support enhanced mode.  Could we 
>>>>>> add support for this to llvm::Regex?
>>>>>>                  • Risk: 6
>>>>>>                  • Impact: 3
>>>>>>                  • Difficulty / Effort: 3  (5 if we have to add enhanced 
>>>>>> mode support)
>>>>> 
>>>>> As long as it supports all of the features, then this is fine. Otherwise 
>>>>> we will need to modify the regular expressions to work with the LLVM 
>>>>> version or back port any advances regex features we need into the LLVM 
>>>>> regex parser. 
>>>>> 
>>>>>>          • Use llvm streams instead of lldb::StreamString
>>>>>>                  • Supports output re-targeting (stderr, stdout, 
>>>>>> std::string, etc), printf style formatting, and type-safe streaming 
>>>>>> operators.
>>>>>>                  • Interoperates nicely with many existing llvm utility 
>>>>>> classes
>>>>>>                  • Risk: 4
>>>>>>                  • Impact: 5
>>>>>>                  • Difficulty / Effort: 7
>>>>> 
>>>>> I have worked extensively with both C++ streams and with printf. I don't 
>>>>> find the type safe streaming operators to be readable and a great way to 
>>>>> place things into streams. Part of my objection to this will be quelled 
>>>>> if the LLVM streams are not stateful like C++ streams are (add an extra 
>>>>> "std::hex" into the stream and suddenly other things down stream start 
>>>>> coming out in hex). As long as printf functionality is in the LLVM 
>>>>> streams I am ok with it.
>>>>> 
>>>>>>          • Use llvm::Error instead of lldb::Error
>>>>>>                  • llvm::Error is an error class that *requires* you to 
>>>>>> check whether it succeeded or it will assert.  In a way, it's similar to 
>>>>>> a C++ exception, except that it doesn't come with the performance hit 
>>>>>> associated with exceptions.  It's extensible, and can be easily extended 
>>>>>> to support the various ways LLDB needs to construct errors and error 
>>>>>> messages.
>>>>>>                  • Would need to first rename lldb::Error to LLDBError 
>>>>>> so that te conversion from LLDBError to llvm::Error could be done 
>>>>>> incrementally.
>>>>>>                  • Risk: 7
>>>>>>                  • Impact: 7
>>>>>>                  • Difficulty / Effort: 8
>>>>> 
>>>>> We must be very careful here. Nothing can crash LLDB and adding something 
>>>>> that will be known to crash in some cases can only be changed if there is 
>>>>> a test that can guarantee that it won't crash. assert() calls in a shared 
>>>>> library like LLDB are not OK and shouldn't fire. Even if they do, when 
>>>>> asserts are off, then it shouldn't crash LLDB. We have made efforts to 
>>>>> only use asserts in LLDB for things that really can't happen, but for 
>>>>> things like constructing a StringRef with NULL is one example of why I 
>>>>> don't want to use certain classes in LLVM. If we can remove the crash 
>>>>> threat, then lets use them. But many people firmly believe that asserts 
>>>>> belong in llvm and clang code and I don't agree.
>>>> 
>>>> I’m surprise by your aversion to assertions, what is your suggested 
>>>> alternative? Are you expecting to check and handle every possible 
>>>> invariants everywhere and recover (or signal an error) properly? That does 
>>>> not seem practical, and it seems to me that it'd lead to just involving UB 
>>>> (or breaking design invariant) without actually noticing it.
>>> 
>>> We have to as a debugger. We get input from a variety of different 
>>> compilers and we parse and use things there were produced by our toolchains 
>>> and others. Crashing Xcode because someone accidentally created a 
>>> llvm::StringRef(NULL) should not happen. It is not OK for library code to 
>>> crash. This is quite OK for compilers, linkers and other tools, but it 
>>> isn't for any other code. 
>> 
>> Well, except that no, it is not OK for the compiler either, we want to use 
>> it as a library (JIT, etc.).
>> But it comes back to my point: you are against asserting for enforcing 
>> “external input” where there should be sanitization done with proper error 
>> handling, which does not say anything about the majority of assertions which 
>> are not dealing with user input.
> 
> I am all for asserting in debug builds. Just not in release builds.


Right, we’re on the same page, and that’s what we do in LLVM.


> And just because you assert, this doesn't mean it is ok to not handle what is 
> being asserted. There is code in clang like:
> 
> void do_something(foo *p)
> {
>    assert(p);
>    p->crash();
> }
> 
> This should be:
> 
> void do_something(foo *p)
> {
>    assert(p);
>    if (p)
>       p->crash();
> }

Well to be honest I wouldn’t use a pointer in the first place but a reference 
in such case.
Otherwise, I’d be fine with such code inside a private API (inside a LLVM 
library for example), where you sanitize the input at the API boundary and 
assert on invariant inside the library.

>> For instance, it is still not clear to me what’s wrong with the asserting 
>> Error class mentioned before.
> 
> If someone adds code that succeeds almost all of the time, including in the 
> test suite, and then we release LLDB and someone uses bad debug info and the 
> error condition suddenly fires and they didn't check the error, it is not 
> acceptable to crash.

This is not what it is about, the assertion fires when the error is 
*unchecked*, independently of success or failure, for example:

ErrorOr<Foo> getFoo();
Foo bar() {
  auto F = getFoo();
  return F.getValue();
}

Here the developer does not check the returned error from the call to getFoo(). 
This is a case where bad-things would happen when there is an actual error, but 
it wouldn’t be caught during testing until the error actually happens.

The LLVM Error class will *always* assert, even when getFoo() succeeds and 
there is no actual error.

—
Mehdi





>> 
>> 
>>> Since there are so many asserts, LLDB code is currently responsible for 
>>> figuring out what will make clang unhappy and we must be sure to not feed 
>>> it anything that makes it unhappy or we crash. So I don't see that as 
>>> better.
>>> 
>>>> 
>>>>> Also many asserts are in header files so even if you build llvm and clang 
>>>>> with asserts off, but then build our project that uses LLVM with asserts 
>>>>> on, you get LLVM and clang asserts when you don't want them.
>>>> 
>>>> It is not really supported to use the LLVM C++ headers without assertions 
>>>> and link to a LLVM build with assertions (and vice-versa).
>>> 
>>> As we have found out. We claim llvm and clang can be linked into tools a 
>>> libraries, but it really means, you should really use it out of process 
>>> because it might crash at any time so don't try and use it in a program you 
>>> actually want to be able run for a long time.
>> 
>> I’m not sure how this sentence relates to the fact that the headers are not 
>> ABI insensitive to -DDEBUG.
>> It seems totally orthogonal.
> 
> It causes us to crash. There is nothing that currently enforces the fact that 
> you can build llvm/clang without asserts yet use the headers with asserts on. 
> It happens, and it is happening to us. If you are designing a library, then 
> asserts in header files are just a bad idea. Very easy to avoid and fix.
> 
>> 
>> — 
>> Mehdi
>> 
>> 
>>> 
>>>> 
>>>> 
>>>>> 
>>>>>>          • StringRef instead of const char *, len everywhere
>>>>>>                  • Can do most common string operations in a way that is 
>>>>>> guaranteed to be safe.
>>>>>>                  • Reduces string manipulation algorithm complexity by 
>>>>>> an order of magnitude.
>>>>>>                  • Can potentially eliminate tens of thousands of string 
>>>>>> copies across the codebase.
>>>>>>                  • Simplifies code.
>>>>>>                  • Risk: 3
>>>>>>                  • Impact: 8
>>>>>>                  • Difficulty / Effort: 7
>>>>> 
>>>>> I don't think StringRef needs to be used everywhere, but it many places 
>>>>> it does make sense. For example our public API should not contain any 
>>>>> LLVM classes in the API. "const char *" is a fine parameter for public 
>>>>> functions. I really hate the fact that constructing llvm::StringRef with 
>>>>> NULL causes an assertion. Many people don't know that and don't take that 
>>>>> into account when making their changes. I would love to see the assertion 
>>>>> taken out to tell you the truth. That would make me feel better about 
>>>>> using StringRef in many more places within LLDB as we shouldn't ever 
>>>>> crash due to this. I would expect most internal APIs are safe to move to 
>>>>> llvm::StringRef, but we will need to make sure none of these require a 
>>>>> null terminated string which would cause us to have to call 
>>>>> "str.str().c_str()". So internally, yes we can cut over to more use of 
>>>>> StringRef. But the public API can't be changed.
>>>>> 
>>>>> 
>>>>>>          • ArrayRef instead of const void *, len everywhere
>>>>>>                  • Same analysis as StringRef
>>>>> 
>>>>> Internally yes, external APIs no.
>>>>>>          • MutableArrayRef instead of void *, len everywhere
>>>>>>                  • Same analysis as StringRef
>>>>> Internally yes, external APIs no.
>>>>>>          • Delete ConstString, use a modified StringPool that is 
>>>>>> thread-safe.
>>>>>>                  • StringPool is a non thread-safe version of 
>>>>>> ConstString.
>>>>>>                  • Strings are internally refcounted so they can be 
>>>>>> cleaned up when they are no longer used.  ConstStrings are a large 
>>>>>> source of memory in LLDB, so ref-counting and removing stale strings has 
>>>>>> the potential to be a huge savings.
>>>>>>                  • Risk: 2
>>>>>>                  • Impact: 9
>>>>>>                  • Difficulty / Effort: 6
>>>>> 
>>>>> We can't currently rely on ref counting. We currently hand out "const 
>>>>> char *" as return values from many public API calls and these rely on the 
>>>>> strings living forever. We many many existing clients and we can't change 
>>>>> the public API. So I vote to avoid this. We are already using LLVM string 
>>>>> pools under the covers and we also associate the mangled/demangled names 
>>>>> in the map as is saves us a ton of cycles when parsing stuff as we never 
>>>>> demangle (very expensive) the same name twice. So I don't see the need to 
>>>>> change this as it is already backed by LLVM technology under the covers.
>>>>> 
>>>>>>          • thread_local instead of lldb::ThreadLocal
>>>>>>                  • This fixes a number of bugs on Windows that cannot be 
>>>>>> fixed otherwise, as they require compiler support.
>>>>>>                  • Some other compilers may not support this yet?
>>>>>>                  • Risk: 2
>>>>>>                  • Impact: 3
>>>>>>                  • Difficulty: 3
>>>>> 
>>>>> As long as all compilers support this then this is fine.
>>>>> 
>>>>>>          • Use llvm::cl for the command line arguments to the primary 
>>>>>> lldb executable.
>>>>>>                  • Risk: 2
>>>>>>                  • Impact: 3
>>>>>>                  • Difficulty / Effort: 4
>>>>> 
>>>>> Easy and fine to switch over to. We might need to port some getopt_long 
>>>>> functionality into it if it doesn't handle all of the things that 
>>>>> getopt_long does. For example arguments and options can be interspersed 
>>>>> in getopt_long(). You can also terminate your arguments with "--". It 
>>>>> accepts single dashes for long option names if they don't conflict with 
>>>>> short options. Many things like this.
>>>>> 
>>>>>>  • Testing - Our testing infrastructure is unstable, and our test 
>>>>>> coverage is lacking.  We should take steps to improve this.
>>>>>>          • Port as much as possible to lit
>>>>>>                  • Simple tests should be trivial to port to lit today.  
>>>>>> If nothing else this serves as a proof of concept while increasing the 
>>>>>> speed and stability of the test suite, since lit is a more stable 
>>>>>> harness.
>>>>> 
>>>>> As long a tests use the public API to run test. We are not doing text 
>>>>> scraping. 
>>>>> 
>>>>>>          • Separate testing tools
>>>>>>                  • One question that remains open is how to represent 
>>>>>> the complicated needs of a debugger in lit tests.  Part a) above covers 
>>>>>> the trivial cases, but what about the difficult cases?  In 
>>>>>> https://reviews.llvm.org/D24591 a number of ideas were discussed.  We 
>>>>>> started getting to this idea towards the end, about a separate tool 
>>>>>> which has an interface independent of the command line interface and 
>>>>>> which can be used to test.  lldb-mi was mentioned.  While I have serious 
>>>>>> concerns about lldb-mi due to its poorly written and tested codebase, I 
>>>>>> do agree in principle with the methodology.  In fact, this is the entire 
>>>>>> philosophy behind lit as used with LLVM, clang, lld, etc.  
>>>>> 
>>>>> We have a public API... Why are we going to go and spend _any_ time on 
>>>>> doing anything else? I just don't get it. What a waste of time. We have a 
>>>>> public API. Lets use it. Not simple enough for people? Tough. Testing a 
>>>>> debugger should be done using the public API except when we are actually 
>>>>> trying to test the LLDB command line in pexpect like tests. Those and 
>>>>> only those are fine to covert over to using LIT and use text scraping. 
>>>>> But 95% of our testing should be done using the API that our IDEs use. 
>>>>> Using MI? Please no.
>>>>>> 
>>>>>> I don’t take full credit for this idea.  I had been toying with a 
>>>>>> similar idea for some time, but it was further cemented in an offline 
>>>>>> discussion with a co-worker.  
>>>>>> 
>>>>>> There many small, targeted tools in LLVM (e.g. llc, lli, llvm-objdump, 
>>>>>> etc) whose purpose are to be chained together to do interesting things.  
>>>>>> Instead of a command line api as we think of in LLDB where you type 
>>>>>> commands from an interactive prompt, they have a command line api as you 
>>>>>> would expect from any tool which is launched from a shell.
>>>>>> 
>>>>>> I can imagine many potential candidates for lldb tools of this nature.  
>>>>>> Off the top of my head:
>>>>>>  • lldb-unwind - A tool for testing the unwinder.  Accepts byte code as 
>>>>>> input and passes it through to the unwinder, outputting a compressed 
>>>>>> summary of the steps taken while unwinding, which could be pattern 
>>>>>> matched in lit.  The output format is entirely controlled by the tool, 
>>>>>> and not by the unwinder itself, so it would be stable in the face of 
>>>>>> changes to the underlying unwinder.  Could have various options to 
>>>>>> enable or disable features of the unwinder in order to force the 
>>>>>> unwinder into modes that can be tricky to encounter in the wild.
>>>>>>  • lldb-symbol - A tool for testing symbol resolution.  Could have 
>>>>>> options for testing things like:
>>>>>>          • Determining if a symbol matches an executable
>>>>>>          • looking up a symbol by name in the debug info, and mapping it 
>>>>>> to an address in the process.  
>>>>>>          • Displaying candidate symbols when doing name lookup in a 
>>>>>> particular scope (e.g. while stopped at a breakpoint).
>>>>>>  • lldb-breakpoint - A tool for testing breakpoints and stepping.  
>>>>>> Various options could include:
>>>>>>          • Set breakpoints and out addresses and/or symbol names where 
>>>>>> they were resolved to.
>>>>>>          • Trigger commands, so that when a breakpoint is hit the tool 
>>>>>> could automatically continue and try to run to another breakpoint, etc.
>>>>>>          • options to inspect certain useful pieces of state about an 
>>>>>> inferior, to be matched in lit. 
>>>>>>  • lldb-interpreter - tests the jitter etc.  I don’t know much about 
>>>>>> this, but I don’t see why this couldn’t be tested in a manner similar to 
>>>>>> how lli is tested.
>>>>>>  • lldb-platform - tests lldb local and remote platform interfaces.
>>>>>>  • lldb-cli -- lldb interactive command line.
>>>>>>  • lldb-format - lldb data formatters etc.
>>>>> 
>>>>> 
>>>>> I agree that testing more functionality it a good idea, but if it is 
>>>>> worth testing we should see if we can get it into our public API. If so, 
>>>>> then we test it there. If not, then we come up with another solution. 
>>>>> Even in the alternate solution, it will be something that will probably 
>>>>> create structured data (JSON, Yaml, etc) and then parsed, so I would 
>>>>> rather spend the time getting these things into out public APIs, or test 
>>>>> them vai our public APIs.
>>>>> 
>>>>>>  • Tests NOW, not later.
>>>>>>          • I know we’ve been over this a million times and it’s not 
>>>>>> worth going over the arguments again.  And I know it’s hard to write 
>>>>>> tests, often requiring the invention of new SB APIs.  Hopefully those 
>>>>>> issues will be addressed by above a) and b) above and writing tests will 
>>>>>> be easier.  Vedant Kumar ran some analytics on the various codebases and 
>>>>>> found that LLDB has the lowest test / commit ratio of any LLVM project 
>>>>>> (He didn’t post numbers for lld, so I’m not sure what it is there).
>>>>>>                  • lldb: 287 of the past 1000 commits
>>>>>>                  • llvm: 511 of the past 1000 commits
>>>>>>                  • clang: 622 of the past 1000 commits
>>>>>>                  • compiler-rt: 543 of the past 1000 commits
>>>>>> This is an alarming statistic, and I would love to see this number 
>>>>>> closer to 50%.
>>>>>>  • Code style / development conventions - Aside from just the column 
>>>>>> limitations and bracing styles, there are other areas where LLDB differs 
>>>>>> from LLVM on code style.  We should continue to adopt more of LLVM's 
>>>>>> style where it makes sense.  I've identified a couple of areas 
>>>>>> (incomplete list) which I outline below.  
>>>>>>          • Clean up the mess of cyclical dependencies and properly layer 
>>>>>> the libraries.  This is especially important for things like lldb-server 
>>>>>> that need to link in as little as possible, but regardless it leads to a 
>>>>>> more robust architecture, faster build and link times, better 
>>>>>> testability, and is required if we ever want to do a modules build of 
>>>>>> LLDB
>>>>>>          • Use CMake instead of Xcode project (CMake supports 
>>>>>> Frameworks).  CMake supports Apple Frameworks, so the main roadblock to 
>>>>>> getting this working is just someone doing it.  Segmenting the build 
>>>>>> process by platform doesn't make sense for the upstream, especially when 
>>>>>> there is a perfectly workable solution.  I have no doubt that the 
>>>>>> resulting Xcode workspace generated automatically by CMake will not be 
>>>>>> as "nice" as one that is maintained by hand.  We face this problem with 
>>>>>> Visual Studio on Windows as well.  The solution that most people have 
>>>>>> adopted is to continue using the IDE for code editing and debugging, but 
>>>>>> for actually running the build, use CMake with Ninja.  A similar 
>>>>>> workflow should still be possible with an OSX CMake build, but as I do 
>>>>>> not work every day on a Mac, all I can say is that it's possible, I have 
>>>>>> no idea how impactful it would be on peoples' workflows.  
>>>>>>          • Variable naming conventions
>>>>>>                  • I don’t expect anyone is too fond of LLDB’s naming 
>>>>>> conventions, but if we’re committed to joining the LLVM ecosystem, then 
>>>>>> let’s go all the way.
>>>>> 
>>>>> Hopefully we can get LLVM and Clang to adopt naming member variables with 
>>>>> a prefix... If so, that is my main remaining concern.
>>>>> 
>>>>>>          • Use more modern C++ and less C
>>>>>>                  • Old habits die hard, but this isn’t just a matter of 
>>>>>> style.  It leads to safer, more robust, and less fragile code as well.
>>>>>>          • Shorter functions and classes with more narrowly targeted 
>>>>>> responsibilities
>>>>>>                  • It’s not uncommon to find functions that are hundreds 
>>>>>> (and in a few cases even 1,000+) of lines long.  We really need to be 
>>>>>> better about breaking functions and classes down into smaller 
>>>>>> responsibilities.  This helps not just for someone coming in to read the 
>>>>>> function, but also for testing.  Smaller functions are easier to unit 
>>>>>> test.
>>>>>>          • Convert T foo(X, Y, Error &error) functions to Expected<T> 
>>>>>> foo(X, Y) style (Depends on 1.c)
>>>>>>                  • llvm::Expected is based on the llvm::Error class 
>>>>>> described earlier.  It’s used when a function is supposed to return a 
>>>>>> value, but it could fail.  By packaging the error with the return value, 
>>>>>> it’s impossible to have a situation where you use the return value even 
>>>>>> in case of an error, and because llvm::Error has mandatory checking, 
>>>>>> it’s also impossible to have a sitaution where you don’t check the 
>>>>>> error.  So it’s very safe.  
>>>>> 
>>>>> As crashes if you don't check the errors. One of the big differences 
>>>>> between LLDB and LLVM/Clang is that asserts are used liberally all over 
>>>>> clang which make the code very crashy when used in production with 
>>>>> uncontrolled input like we get in the debugger. We will need to be very 
>>>>> careful with any changes to make sure we don't increase crash frequency.
>>>>> 
>>>>>> 
>>>>>> Whew.  That was a lot.  If you made it this far, thanks for reading!
>>>>>> 
>>>>>> Obviously if we were to embark on all of the above, it would take many 
>>>>>> months to complete everything.  So I'm not proposing anyone stop what 
>>>>>> they're doing to work on this.  This is just my own personal wishlist
>>>>> 
>>>>> There are many great things in here. As long as we are careful to not 
>>>>> increase the crash frequency in LLDB I am all for it. My mains areas of 
>>>>> concern are:
>>>>> - public API can't change in its current form
>>>>> - no LLVM or clang classes in the public API as arguments or return 
>>>>> values.
>>>>> - don't crash more by introducing assert heavy code into code paths that 
>>>>> use input from outside sources 
>>>> 
>>>> It seems to me that you are mixing two things: asserting on user inputs vs 
>>>> asserting on internal invariants of the system. LLVM is very intensive 
>>>> about asserting on the second category and this seems fine to me.
>>>> Asserting on external inputs is not great (in LLVM as much as in LLDB).
>>>> 
>>>> The asserting error class above falls into the second category and is a 
>>>> great tool to enforce programmer error
>>>> 
>>>> — 
>>>> Mehdi

_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] LLDB Evolution - Final Form

Reply via email to