> On Sep 20, 2016, at 10:33 AM, Greg Clayton <[email protected]> wrote:
>
>>
>> On Sep 19, 2016, at 2:46 PM, Mehdi Amini <[email protected]> wrote:
>>
>>>
>>> On Sep 19, 2016, at 2:34 PM, Greg Clayton <[email protected]> wrote:
>>>
>>>>
>>>> On Sep 19, 2016, at 2:20 PM, Mehdi Amini <[email protected]> wrote:
>>>>
>>>>>
>>>>> On Sep 19, 2016, at 2:02 PM, Greg Clayton via lldb-dev
>>>>> <[email protected]> wrote:
>>>>>
>>>>>
>>>>>> On Sep 19, 2016, at 1:18 PM, Zachary Turner via lldb-dev
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> Following up with Kate's post from a few weeks ago, I think the dust has
>>>>>> settled on the code reformat and it went over pretty smoothly for the
>>>>>> most part. So I thought it might be worth throwing out some ideas for
>>>>>> where we go from here. I have a large list of ideas (more ideas than
>>>>>> time, sadly) that I've been collecting over the past few weeks, so I
>>>>>> figured I would throw them out in the open for discussion.
>>>>>>
>>>>>> I’ve grouped the areas for improvement into 3 high level categories.
>>>>>>
>>>>>> • De-inventing the wheel - We should use more code from LLVM, and
>>>>>> delete code in LLDB where LLVM provides a solution. In cases where
>>>>>> there is an LLVM thing that is *similar* to what we need, we should
>>>>>> extend the LLVM thing to support what we need, and then use it.
>>>>>> Following are some areas I've identified. This list is by no means
>>>>>> complete. For each one, I've given a personal assessment of how likely
>>>>>> it is to cause some (temporary) hiccups, how much it would help us in
>>>>>> the long run, and how difficult it would be to do. Without further ado:
>>>>>> • Use llvm::Regex instead of lldb::Regex
>>>>>> • llvm::Regex doesn’t support enhanced mode. Could we
>>>>>> add support for this to llvm::Regex?
>>>>>> • Risk: 6
>>>>>> • Impact: 3
>>>>>> • Difficulty / Effort: 3 (5 if we have to add enhanced
>>>>>> mode support)
>>>>>
>>>>> As long as it supports all of the features, then this is fine. Otherwise
>>>>> we will need to modify the regular expressions to work with the LLVM
>>>>> version or back port any advances regex features we need into the LLVM
>>>>> regex parser.
>>>>>
>>>>>> • Use llvm streams instead of lldb::StreamString
>>>>>> • Supports output re-targeting (stderr, stdout,
>>>>>> std::string, etc), printf style formatting, and type-safe streaming
>>>>>> operators.
>>>>>> • Interoperates nicely with many existing llvm utility
>>>>>> classes
>>>>>> • Risk: 4
>>>>>> • Impact: 5
>>>>>> • Difficulty / Effort: 7
>>>>>
>>>>> I have worked extensively with both C++ streams and with printf. I don't
>>>>> find the type safe streaming operators to be readable and a great way to
>>>>> place things into streams. Part of my objection to this will be quelled
>>>>> if the LLVM streams are not stateful like C++ streams are (add an extra
>>>>> "std::hex" into the stream and suddenly other things down stream start
>>>>> coming out in hex). As long as printf functionality is in the LLVM
>>>>> streams I am ok with it.
>>>>>
>>>>>> • Use llvm::Error instead of lldb::Error
>>>>>> • llvm::Error is an error class that *requires* you to
>>>>>> check whether it succeeded or it will assert. In a way, it's similar to
>>>>>> a C++ exception, except that it doesn't come with the performance hit
>>>>>> associated with exceptions. It's extensible, and can be easily extended
>>>>>> to support the various ways LLDB needs to construct errors and error
>>>>>> messages.
>>>>>> • Would need to first rename lldb::Error to LLDBError
>>>>>> so that te conversion from LLDBError to llvm::Error could be done
>>>>>> incrementally.
>>>>>> • Risk: 7
>>>>>> • Impact: 7
>>>>>> • Difficulty / Effort: 8
>>>>>
>>>>> We must be very careful here. Nothing can crash LLDB and adding something
>>>>> that will be known to crash in some cases can only be changed if there is
>>>>> a test that can guarantee that it won't crash. assert() calls in a shared
>>>>> library like LLDB are not OK and shouldn't fire. Even if they do, when
>>>>> asserts are off, then it shouldn't crash LLDB. We have made efforts to
>>>>> only use asserts in LLDB for things that really can't happen, but for
>>>>> things like constructing a StringRef with NULL is one example of why I
>>>>> don't want to use certain classes in LLVM. If we can remove the crash
>>>>> threat, then lets use them. But many people firmly believe that asserts
>>>>> belong in llvm and clang code and I don't agree.
>>>>
>>>> I’m surprise by your aversion to assertions, what is your suggested
>>>> alternative? Are you expecting to check and handle every possible
>>>> invariants everywhere and recover (or signal an error) properly? That does
>>>> not seem practical, and it seems to me that it'd lead to just involving UB
>>>> (or breaking design invariant) without actually noticing it.
>>>
>>> We have to as a debugger. We get input from a variety of different
>>> compilers and we parse and use things there were produced by our toolchains
>>> and others. Crashing Xcode because someone accidentally created a
>>> llvm::StringRef(NULL) should not happen. It is not OK for library code to
>>> crash. This is quite OK for compilers, linkers and other tools, but it
>>> isn't for any other code.
>>
>> Well, except that no, it is not OK for the compiler either, we want to use
>> it as a library (JIT, etc.).
>> But it comes back to my point: you are against asserting for enforcing
>> “external input” where there should be sanitization done with proper error
>> handling, which does not say anything about the majority of assertions which
>> are not dealing with user input.
>
> I am all for asserting in debug builds. Just not in release builds.
Right, we’re on the same page, and that’s what we do in LLVM.
> And just because you assert, this doesn't mean it is ok to not handle what is
> being asserted. There is code in clang like:
>
> void do_something(foo *p)
> {
> assert(p);
> p->crash();
> }
>
> This should be:
>
> void do_something(foo *p)
> {
> assert(p);
> if (p)
> p->crash();
> }
Well to be honest I wouldn’t use a pointer in the first place but a reference
in such case.
Otherwise, I’d be fine with such code inside a private API (inside a LLVM
library for example), where you sanitize the input at the API boundary and
assert on invariant inside the library.
>> For instance, it is still not clear to me what’s wrong with the asserting
>> Error class mentioned before.
>
> If someone adds code that succeeds almost all of the time, including in the
> test suite, and then we release LLDB and someone uses bad debug info and the
> error condition suddenly fires and they didn't check the error, it is not
> acceptable to crash.
This is not what it is about, the assertion fires when the error is
*unchecked*, independently of success or failure, for example:
ErrorOr<Foo> getFoo();
Foo bar() {
auto F = getFoo();
return F.getValue();
}
Here the developer does not check the returned error from the call to getFoo().
This is a case where bad-things would happen when there is an actual error, but
it wouldn’t be caught during testing until the error actually happens.
The LLVM Error class will *always* assert, even when getFoo() succeeds and
there is no actual error.
—
Mehdi
>>
>>
>>> Since there are so many asserts, LLDB code is currently responsible for
>>> figuring out what will make clang unhappy and we must be sure to not feed
>>> it anything that makes it unhappy or we crash. So I don't see that as
>>> better.
>>>
>>>>
>>>>> Also many asserts are in header files so even if you build llvm and clang
>>>>> with asserts off, but then build our project that uses LLVM with asserts
>>>>> on, you get LLVM and clang asserts when you don't want them.
>>>>
>>>> It is not really supported to use the LLVM C++ headers without assertions
>>>> and link to a LLVM build with assertions (and vice-versa).
>>>
>>> As we have found out. We claim llvm and clang can be linked into tools a
>>> libraries, but it really means, you should really use it out of process
>>> because it might crash at any time so don't try and use it in a program you
>>> actually want to be able run for a long time.
>>
>> I’m not sure how this sentence relates to the fact that the headers are not
>> ABI insensitive to -DDEBUG.
>> It seems totally orthogonal.
>
> It causes us to crash. There is nothing that currently enforces the fact that
> you can build llvm/clang without asserts yet use the headers with asserts on.
> It happens, and it is happening to us. If you are designing a library, then
> asserts in header files are just a bad idea. Very easy to avoid and fix.
>
>>
>> —
>> Mehdi
>>
>>
>>>
>>>>
>>>>
>>>>>
>>>>>> • StringRef instead of const char *, len everywhere
>>>>>> • Can do most common string operations in a way that is
>>>>>> guaranteed to be safe.
>>>>>> • Reduces string manipulation algorithm complexity by
>>>>>> an order of magnitude.
>>>>>> • Can potentially eliminate tens of thousands of string
>>>>>> copies across the codebase.
>>>>>> • Simplifies code.
>>>>>> • Risk: 3
>>>>>> • Impact: 8
>>>>>> • Difficulty / Effort: 7
>>>>>
>>>>> I don't think StringRef needs to be used everywhere, but it many places
>>>>> it does make sense. For example our public API should not contain any
>>>>> LLVM classes in the API. "const char *" is a fine parameter for public
>>>>> functions. I really hate the fact that constructing llvm::StringRef with
>>>>> NULL causes an assertion. Many people don't know that and don't take that
>>>>> into account when making their changes. I would love to see the assertion
>>>>> taken out to tell you the truth. That would make me feel better about
>>>>> using StringRef in many more places within LLDB as we shouldn't ever
>>>>> crash due to this. I would expect most internal APIs are safe to move to
>>>>> llvm::StringRef, but we will need to make sure none of these require a
>>>>> null terminated string which would cause us to have to call
>>>>> "str.str().c_str()". So internally, yes we can cut over to more use of
>>>>> StringRef. But the public API can't be changed.
>>>>>
>>>>>
>>>>>> • ArrayRef instead of const void *, len everywhere
>>>>>> • Same analysis as StringRef
>>>>>
>>>>> Internally yes, external APIs no.
>>>>>> • MutableArrayRef instead of void *, len everywhere
>>>>>> • Same analysis as StringRef
>>>>> Internally yes, external APIs no.
>>>>>> • Delete ConstString, use a modified StringPool that is
>>>>>> thread-safe.
>>>>>> • StringPool is a non thread-safe version of
>>>>>> ConstString.
>>>>>> • Strings are internally refcounted so they can be
>>>>>> cleaned up when they are no longer used. ConstStrings are a large
>>>>>> source of memory in LLDB, so ref-counting and removing stale strings has
>>>>>> the potential to be a huge savings.
>>>>>> • Risk: 2
>>>>>> • Impact: 9
>>>>>> • Difficulty / Effort: 6
>>>>>
>>>>> We can't currently rely on ref counting. We currently hand out "const
>>>>> char *" as return values from many public API calls and these rely on the
>>>>> strings living forever. We many many existing clients and we can't change
>>>>> the public API. So I vote to avoid this. We are already using LLVM string
>>>>> pools under the covers and we also associate the mangled/demangled names
>>>>> in the map as is saves us a ton of cycles when parsing stuff as we never
>>>>> demangle (very expensive) the same name twice. So I don't see the need to
>>>>> change this as it is already backed by LLVM technology under the covers.
>>>>>
>>>>>> • thread_local instead of lldb::ThreadLocal
>>>>>> • This fixes a number of bugs on Windows that cannot be
>>>>>> fixed otherwise, as they require compiler support.
>>>>>> • Some other compilers may not support this yet?
>>>>>> • Risk: 2
>>>>>> • Impact: 3
>>>>>> • Difficulty: 3
>>>>>
>>>>> As long as all compilers support this then this is fine.
>>>>>
>>>>>> • Use llvm::cl for the command line arguments to the primary
>>>>>> lldb executable.
>>>>>> • Risk: 2
>>>>>> • Impact: 3
>>>>>> • Difficulty / Effort: 4
>>>>>
>>>>> Easy and fine to switch over to. We might need to port some getopt_long
>>>>> functionality into it if it doesn't handle all of the things that
>>>>> getopt_long does. For example arguments and options can be interspersed
>>>>> in getopt_long(). You can also terminate your arguments with "--". It
>>>>> accepts single dashes for long option names if they don't conflict with
>>>>> short options. Many things like this.
>>>>>
>>>>>> • Testing - Our testing infrastructure is unstable, and our test
>>>>>> coverage is lacking. We should take steps to improve this.
>>>>>> • Port as much as possible to lit
>>>>>> • Simple tests should be trivial to port to lit today.
>>>>>> If nothing else this serves as a proof of concept while increasing the
>>>>>> speed and stability of the test suite, since lit is a more stable
>>>>>> harness.
>>>>>
>>>>> As long a tests use the public API to run test. We are not doing text
>>>>> scraping.
>>>>>
>>>>>> • Separate testing tools
>>>>>> • One question that remains open is how to represent
>>>>>> the complicated needs of a debugger in lit tests. Part a) above covers
>>>>>> the trivial cases, but what about the difficult cases? In
>>>>>> https://reviews.llvm.org/D24591 a number of ideas were discussed. We
>>>>>> started getting to this idea towards the end, about a separate tool
>>>>>> which has an interface independent of the command line interface and
>>>>>> which can be used to test. lldb-mi was mentioned. While I have serious
>>>>>> concerns about lldb-mi due to its poorly written and tested codebase, I
>>>>>> do agree in principle with the methodology. In fact, this is the entire
>>>>>> philosophy behind lit as used with LLVM, clang, lld, etc.
>>>>>
>>>>> We have a public API... Why are we going to go and spend _any_ time on
>>>>> doing anything else? I just don't get it. What a waste of time. We have a
>>>>> public API. Lets use it. Not simple enough for people? Tough. Testing a
>>>>> debugger should be done using the public API except when we are actually
>>>>> trying to test the LLDB command line in pexpect like tests. Those and
>>>>> only those are fine to covert over to using LIT and use text scraping.
>>>>> But 95% of our testing should be done using the API that our IDEs use.
>>>>> Using MI? Please no.
>>>>>>
>>>>>> I don’t take full credit for this idea. I had been toying with a
>>>>>> similar idea for some time, but it was further cemented in an offline
>>>>>> discussion with a co-worker.
>>>>>>
>>>>>> There many small, targeted tools in LLVM (e.g. llc, lli, llvm-objdump,
>>>>>> etc) whose purpose are to be chained together to do interesting things.
>>>>>> Instead of a command line api as we think of in LLDB where you type
>>>>>> commands from an interactive prompt, they have a command line api as you
>>>>>> would expect from any tool which is launched from a shell.
>>>>>>
>>>>>> I can imagine many potential candidates for lldb tools of this nature.
>>>>>> Off the top of my head:
>>>>>> • lldb-unwind - A tool for testing the unwinder. Accepts byte code as
>>>>>> input and passes it through to the unwinder, outputting a compressed
>>>>>> summary of the steps taken while unwinding, which could be pattern
>>>>>> matched in lit. The output format is entirely controlled by the tool,
>>>>>> and not by the unwinder itself, so it would be stable in the face of
>>>>>> changes to the underlying unwinder. Could have various options to
>>>>>> enable or disable features of the unwinder in order to force the
>>>>>> unwinder into modes that can be tricky to encounter in the wild.
>>>>>> • lldb-symbol - A tool for testing symbol resolution. Could have
>>>>>> options for testing things like:
>>>>>> • Determining if a symbol matches an executable
>>>>>> • looking up a symbol by name in the debug info, and mapping it
>>>>>> to an address in the process.
>>>>>> • Displaying candidate symbols when doing name lookup in a
>>>>>> particular scope (e.g. while stopped at a breakpoint).
>>>>>> • lldb-breakpoint - A tool for testing breakpoints and stepping.
>>>>>> Various options could include:
>>>>>> • Set breakpoints and out addresses and/or symbol names where
>>>>>> they were resolved to.
>>>>>> • Trigger commands, so that when a breakpoint is hit the tool
>>>>>> could automatically continue and try to run to another breakpoint, etc.
>>>>>> • options to inspect certain useful pieces of state about an
>>>>>> inferior, to be matched in lit.
>>>>>> • lldb-interpreter - tests the jitter etc. I don’t know much about
>>>>>> this, but I don’t see why this couldn’t be tested in a manner similar to
>>>>>> how lli is tested.
>>>>>> • lldb-platform - tests lldb local and remote platform interfaces.
>>>>>> • lldb-cli -- lldb interactive command line.
>>>>>> • lldb-format - lldb data formatters etc.
>>>>>
>>>>>
>>>>> I agree that testing more functionality it a good idea, but if it is
>>>>> worth testing we should see if we can get it into our public API. If so,
>>>>> then we test it there. If not, then we come up with another solution.
>>>>> Even in the alternate solution, it will be something that will probably
>>>>> create structured data (JSON, Yaml, etc) and then parsed, so I would
>>>>> rather spend the time getting these things into out public APIs, or test
>>>>> them vai our public APIs.
>>>>>
>>>>>> • Tests NOW, not later.
>>>>>> • I know we’ve been over this a million times and it’s not
>>>>>> worth going over the arguments again. And I know it’s hard to write
>>>>>> tests, often requiring the invention of new SB APIs. Hopefully those
>>>>>> issues will be addressed by above a) and b) above and writing tests will
>>>>>> be easier. Vedant Kumar ran some analytics on the various codebases and
>>>>>> found that LLDB has the lowest test / commit ratio of any LLVM project
>>>>>> (He didn’t post numbers for lld, so I’m not sure what it is there).
>>>>>> • lldb: 287 of the past 1000 commits
>>>>>> • llvm: 511 of the past 1000 commits
>>>>>> • clang: 622 of the past 1000 commits
>>>>>> • compiler-rt: 543 of the past 1000 commits
>>>>>> This is an alarming statistic, and I would love to see this number
>>>>>> closer to 50%.
>>>>>> • Code style / development conventions - Aside from just the column
>>>>>> limitations and bracing styles, there are other areas where LLDB differs
>>>>>> from LLVM on code style. We should continue to adopt more of LLVM's
>>>>>> style where it makes sense. I've identified a couple of areas
>>>>>> (incomplete list) which I outline below.
>>>>>> • Clean up the mess of cyclical dependencies and properly layer
>>>>>> the libraries. This is especially important for things like lldb-server
>>>>>> that need to link in as little as possible, but regardless it leads to a
>>>>>> more robust architecture, faster build and link times, better
>>>>>> testability, and is required if we ever want to do a modules build of
>>>>>> LLDB
>>>>>> • Use CMake instead of Xcode project (CMake supports
>>>>>> Frameworks). CMake supports Apple Frameworks, so the main roadblock to
>>>>>> getting this working is just someone doing it. Segmenting the build
>>>>>> process by platform doesn't make sense for the upstream, especially when
>>>>>> there is a perfectly workable solution. I have no doubt that the
>>>>>> resulting Xcode workspace generated automatically by CMake will not be
>>>>>> as "nice" as one that is maintained by hand. We face this problem with
>>>>>> Visual Studio on Windows as well. The solution that most people have
>>>>>> adopted is to continue using the IDE for code editing and debugging, but
>>>>>> for actually running the build, use CMake with Ninja. A similar
>>>>>> workflow should still be possible with an OSX CMake build, but as I do
>>>>>> not work every day on a Mac, all I can say is that it's possible, I have
>>>>>> no idea how impactful it would be on peoples' workflows.
>>>>>> • Variable naming conventions
>>>>>> • I don’t expect anyone is too fond of LLDB’s naming
>>>>>> conventions, but if we’re committed to joining the LLVM ecosystem, then
>>>>>> let’s go all the way.
>>>>>
>>>>> Hopefully we can get LLVM and Clang to adopt naming member variables with
>>>>> a prefix... If so, that is my main remaining concern.
>>>>>
>>>>>> • Use more modern C++ and less C
>>>>>> • Old habits die hard, but this isn’t just a matter of
>>>>>> style. It leads to safer, more robust, and less fragile code as well.
>>>>>> • Shorter functions and classes with more narrowly targeted
>>>>>> responsibilities
>>>>>> • It’s not uncommon to find functions that are hundreds
>>>>>> (and in a few cases even 1,000+) of lines long. We really need to be
>>>>>> better about breaking functions and classes down into smaller
>>>>>> responsibilities. This helps not just for someone coming in to read the
>>>>>> function, but also for testing. Smaller functions are easier to unit
>>>>>> test.
>>>>>> • Convert T foo(X, Y, Error &error) functions to Expected<T>
>>>>>> foo(X, Y) style (Depends on 1.c)
>>>>>> • llvm::Expected is based on the llvm::Error class
>>>>>> described earlier. It’s used when a function is supposed to return a
>>>>>> value, but it could fail. By packaging the error with the return value,
>>>>>> it’s impossible to have a situation where you use the return value even
>>>>>> in case of an error, and because llvm::Error has mandatory checking,
>>>>>> it’s also impossible to have a sitaution where you don’t check the
>>>>>> error. So it’s very safe.
>>>>>
>>>>> As crashes if you don't check the errors. One of the big differences
>>>>> between LLDB and LLVM/Clang is that asserts are used liberally all over
>>>>> clang which make the code very crashy when used in production with
>>>>> uncontrolled input like we get in the debugger. We will need to be very
>>>>> careful with any changes to make sure we don't increase crash frequency.
>>>>>
>>>>>>
>>>>>> Whew. That was a lot. If you made it this far, thanks for reading!
>>>>>>
>>>>>> Obviously if we were to embark on all of the above, it would take many
>>>>>> months to complete everything. So I'm not proposing anyone stop what
>>>>>> they're doing to work on this. This is just my own personal wishlist
>>>>>
>>>>> There are many great things in here. As long as we are careful to not
>>>>> increase the crash frequency in LLDB I am all for it. My mains areas of
>>>>> concern are:
>>>>> - public API can't change in its current form
>>>>> - no LLVM or clang classes in the public API as arguments or return
>>>>> values.
>>>>> - don't crash more by introducing assert heavy code into code paths that
>>>>> use input from outside sources
>>>>
>>>> It seems to me that you are mixing two things: asserting on user inputs vs
>>>> asserting on internal invariants of the system. LLVM is very intensive
>>>> about asserting on the second category and this seems fine to me.
>>>> Asserting on external inputs is not great (in LLVM as much as in LLDB).
>>>>
>>>> The asserting error class above falls into the second category and is a
>>>> great tool to enforce programmer error
>>>>
>>>> —
>>>> Mehdi
_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev