eugene marked 4 inline comments as done. eugene added a comment.
> I think we should do some performance measurements to see whether this needs > more optimising or it's fine as is. > > I propose the following benchmark: > > bin/lldb bin/clang > > make sure clang is statically linked > breakpoint set --name non_exisiting_function > > Setting the breakpoint on a nonexistent function avoids us timing the > breakpoint setting machinery, while still getting every symbol in the > executable parsed. I did some micro-benchmarking and on average new parser is ~3 time slower than the old one. (new parser - ~200k string/s, old parser - ~700k string/s) clang::Lexer appears to be the slowest part of it. On the clang breakpoint benchmark you proposed, it is hard to notice much of a difference. clang binary has about 500k functions. With new parser it takes about 11s to try to set a breakpoint, on the old one it's about 10s. (release version of LLDB, debug static version of clang) I decided to use a hybrid approach, when we use old parsing code for simple cases and call new parser only when it fails. 80% of clang functions are simple enough that we don't really need the new parser, so it helped to bring clang breakpoint test back to 10s. I think it's reasonable to assume that similar distribution is true for most programs, and most of their functions can be parsed with the old code. I don't think we can really improve performance of a new parser without giving up on clang::Lexer, and I'm reluctant to do it. So I propose to keep hybrid approach and call a new parser only for complicated cases. ================ Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp:94 if (!m_parsed && m_full) { - // ConstString mangled; - // m_full.GetMangledCounterpart(mangled); - // printf ("\n parsing = '%s'\n", m_full.GetCString()); - // if (mangled) - // printf (" mangled = '%s'\n", mangled.GetCString()); - m_parse_error = false; - m_parsed = true; - llvm::StringRef full(m_full.GetCString()); - - size_t arg_start, arg_end; - llvm::StringRef parens("()", 2); - if (ReverseFindMatchingChars(full, parens, arg_start, arg_end)) { - m_arguments = full.substr(arg_start, arg_end - arg_start + 1); - if (arg_end + 1 < full.size()) - m_qualifiers = full.substr(arg_end + 1); - if (arg_start > 0) { - size_t basename_end = arg_start; - size_t context_start = 0; - size_t context_end = llvm::StringRef::npos; - if (basename_end > 0 && full[basename_end - 1] == '>') { - // TODO: handle template junk... - // Templated function - size_t template_start, template_end; - llvm::StringRef lt_gt("<>", 2); - if (ReverseFindMatchingChars(full, lt_gt, template_start, - template_end, basename_end)) { - // Check for templated functions that include return type like: - // 'void foo<Int>()' - context_start = full.rfind(' ', template_start); - if (context_start == llvm::StringRef::npos) - context_start = 0; - else - ++context_start; - - context_end = full.rfind(':', template_start); - if (context_end == llvm::StringRef::npos || - context_end < context_start) - context_end = context_start; - } else { - context_end = full.rfind(':', basename_end); - } - } else if (context_end == llvm::StringRef::npos) { - context_end = full.rfind(':', basename_end); - } - - if (context_end == llvm::StringRef::npos) - m_basename = full.substr(0, basename_end); - else { - if (context_start < context_end) - m_context = - full.substr(context_start, context_end - 1 - context_start); - const size_t basename_begin = context_end + 1; - m_basename = - full.substr(basename_begin, basename_end - basename_begin); - } - m_type = eTypeUnknownMethod; - } else { - m_parse_error = true; - return; - } - - if (!IsValidBasename(m_basename)) { - // The C++ basename doesn't match our regular expressions so this can't - // be a valid C++ method, clear everything out and indicate an error - m_context = llvm::StringRef(); - m_basename = llvm::StringRef(); - m_arguments = llvm::StringRef(); - m_qualifiers = llvm::StringRef(); - m_parse_error = true; - } + CPlusPlusNameParser parser(m_full.GetStringRef()); + auto function = parser.ParseAsFunctionDefinition(); ---------------- labath wrote: > How about the following api: > ``` > if (auto function​ = > CPlusPlusNameParser::ParseAsFunctionDefinition(m_full.GetStringRef())) { > ... > ``` If you don't mind I'll leave it as it is. I understand that it's very tempting to have two simple functions ParseAsFunctionDefinition and ParseAsFullName instead of a class, but I can imagine calling second one if first one fails, and in this case it'll be good that parser doesn't need to tokenize string all over again. ================ Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp:160 + auto full_name = parser.ParseAsFullName(); + if (full_name.hasValue()) { + identifier = full_name.getValue().m_basename; ---------------- labath wrote: > Same here see above ================ Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp:62 +bool CPlusPlusNameParser::HasMoreTokens() { + return m_next_token_index < static_cast<int>(m_tokens.size()); +} ---------------- labath wrote: > Wouldn't it be better to change the type of m_next_token_index to size_t? Unsigned types are dangerous. I prefer to avoid them wherever I can. As far as I remember many members of C++ standard commit publicly regretted the fact that .size() returns an unsigned type. ================ Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp:188 + // as arithmetic or shift operators not as template brackets. + // Examples: std::enable_if<(10u)<(64), bool> + // f<A<operator<(X,Y)::Subclass>> ---------------- labath wrote: > Is this really the case for the types we are interested in? I would have > hoped that this would get simplified to `std::enable_if<true, bool> before it > reaches us? Yes. I was very surprised as well. But apparently compiler sometimes is being lazy and doesn't simplify everything :( Running: ~$ nm -C --defined-only clang | grep std::enable_if Gives a few symbols like this: std::enable_if<(10u)<(64), bool>::type llvm::isUInt<10u>(unsigned long) ================ Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp:572 + clang::SourceLocation end_location = last_token.getLocation(); + const char *begin_ptr = m_text.data() + begin_location.getRawEncoding(); + const char *end_ptr = ---------------- labath wrote: > Could this be written as: > `m_text.take_front(end_pos).drop_front(start_pos)`? Thanks! I still have a lot to learn about llvm classes. ================ Comment at: source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.h:94 + // when destructed unless it's manually removed with Remove(). + class Bookmark { + public: ---------------- labath wrote: > Please make the type move-only. Otherwise you will have a fun time debugging > accidental copies. (You already have one, although it is benign now) Good catch! Thanks! ================ Comment at: unittests/Language/CPlusPlus/CPlusPlusLanguageTest.cpp:106 + EXPECT_EQ(test.qualifiers, method.GetQualifiers()) + << method.GetQualifiers(); EXPECT_EQ(test.scope_qualified_name, method.GetScopeQualifiedName()); ---------------- labath wrote: > Would defining operator<< for std::ostream and StringRef ( local tomthis > test) enable you to get rid of these? > I've ran into this before but was never annoyed enough to actually do it.. :/ I tried to to it, but it didn't work for me after a few tries. I decided to just convert it to std::string, it solves the same problem. https://reviews.llvm.org/D31451 _______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits