> On Feb 11, 2015, at 11:23 AM, Zachary Turner <[email protected]> wrote:
>
>
>
> On Wed Feb 11 2015 at 11:13:22 AM Greg Clayton <[email protected]> wrote:
>
> > On Feb 11, 2015, at 10:18 AM, Zachary Turner <[email protected]> wrote:
> >
> > A patch is up which fixes a number of issues with strings not being null
> > terminated in LLDB. The issues and error-proneness of using raw c strings
> > is well documented and understood, so I would like to propose banning them
> > in LLDB for new code [1].
> >
> > I realize that's a tall order, and I don't expect all the occurrences of
> > them to go away overnight, but issues related to strings have been popping
> > up over and over again, and it would be nice if we could deal with more
> > interesting problems.
> >
> > LLVM has a very robust string manipulation library, and while it's not
> > perfect, it would have prevented almost all of these issues from ever
> > happening in the first place. Here is a quick summary of common C string
> > types / operations and their corresponding LLVM equivalents:
> >
> > Types:
> > char stack_buffer[n]; // Use llvm::SmallString<n>.
>
> That would be fine.
>
> > const char* foo; // Use llvm::StringRef foo.
>
> This may be fine as long as there is absolutely _no_ string manipulation on
> "foo" (like "strcat(...) etc. Use std::string for anything that requires
> storage. And we don't want to start using any code that does "let me assign
> an llvm::StringRef() with a ConstString("hello").GetCString() so I don't have
> to worry about ownership. Adding string to the constant string pool should
> be reserved for things that actually need it (any const names like
> namespaces, variable names, type names, paths, etc) and also for when we
> return "const char *" through the public API.
>
> This also means that we can't trust that "foo" is NULL terminated
> (llvm::StringRef objects have a data pointer and a length and aren't required
> to be terminated) and it means we must call "foo.str().c_str()" any time we
> require NULL termination. So I actually don't like llvm::StringRef as a
> replacement "const char *"...
> We can't assume that const char*s are null terminated either.
Yes we can, that is was is assumed by const char * right now for every call we
use it with.
> They're just pointers to raw buffers, which you may or may not have null
> terminated. So we have the same issue.
No we don't, it isn't part of the the way you normally use c-strings.
llvm::StringRef doesn't need to NULL terminate as it is part of its design.
> By replacing all uses of const char* with StringRef(), we end up with exactly
> the same semantics. null terminated strings are null terminated, non null
> terminated strings aren't.
But that isn't what is currently meant by _any_ of the functions that take a
"const char *" with no length parameter, so it means that any switch to using
llvm::StringRef requires we use ".str().c_str()" which I would like to avoid.
>
>
>
> > Passing arguments;
> > void foo(char *, int len); // Use void foo(llvm::SmallVectorImpl<char>
> > &buf);
>
> I would prefer to use std::string here. Clearer and everyone knows how to use
> it. SmallVectors are nice, but they have the overhead of storing more data on
> the stack. Clang currently can create quite large stack frames when you have
> some code that has large switch statements where each switch statement has
> local variables. If those switch statements how SmallVector<char, PATH_MAX>,
> then you now have a case where if this code is called recursively we end up
> blowing the stack. We have seen this a lot in the DWARF parser, so I don't
> want to encourage using SmallVector for strings, since many places we use
> strings are for paths and other larger buffers. Any current std::string
> implementation can store up to 24 characters in the std::string without
> needing to do any allocations on the heap, so that essentially gets us a
> SmallVector<char, 24>. So I vote std::string.
> passing SmallVectorImpl<> is intended as a replacement for the case where you
> have this:
>
> void foo(char *buf);
> void bar() {
> char buffer[n];
> foo(buffer);
> }
>
> The equivalent LLVM code is this:
>
> void foo(llvm::SmallVectorImpl<char> &buf);
> void bar() {
> llvm::SmallString<n> buffer;
> foo(buffer);
> }
>
> Exactly the same amount of stack space. It's not intended to be a general
> replacement for std::string, which stores its data entirely on the heap.
std::string doesn't always store content on the heap. It stores the first ~20
bytes in the std::string itself (at least for any STL implementation that is
worth anything). So I really don't see the need to start using a new string
class when std::string is just about all we need and it will take up less space
on the stack for reasons I mentioned above with switch statements (3 pointers
at most).
> I do think that many places in LLDB that currently use stack buffers could be
> re-written just as well using heap buffers, and for that I agree std::string
> is fine.
Agreed.
>
>
> > void foo(const char * foo); // Use void foo(llvm::StringRef foo);
>
>
> Again, I don't like this because you must call "foo.str().c_str()" in order
> to guarantee NULL termination.
> As mentioned previously, in the const char * version you have to guarantee
> that it was initialized with a null terminated string in the first place.
> The burden is the same either way. Either we're already making the
> assumption in the callee, in which case we can continue to make it with the
> StringRef(), or we aren't making the assumption in the callee, in which case
> we can continue not making the assumption but have the added benefit of
> having the length field passed to us.
I disagree. Passing "const char*" assumes NULL termination unless a length
field is also supplied. Passing llvm::StringRef as an object that is known to
not NULL terminate, then we have can't assume termination. So I don't agree
that passing llvm::StringRef makes anything better.
>
>
> >
> > Operations:
> > strcpy, strncpy, etc // Use member functions of SmallString /
> > StringRef
> > sprintf, snprintf, etc // Use llvm::raw_ostream
>
> We have StringStream, people should be using that. raw_ostream doesn't
> support all printf style formats. So all sprintf and snprintf function calls
> should switch to using lldb_private::StreamString.
> I actually disagree here. Printf() has plenty of problems, such as
> non-portable and confusing arguments,
I agree on this one, but we are getting around this with using PRI macros.
> lack of type safety,
most of this is taken care of when you have the printf format warnings on your
class.
> possibility of buffer overruns.
There are no buffer overruns in the Stream::Printf(). Check the implementation
if you need to.
> I would actually prefer to use raw_ostream instead of StringStream.
I don't. I have written tools using streaming style stuff before (like
std::cout) and also written using printf style stuff and I find the printf
style stuff more readable and maintainable. I also don't like streams that
maintain state like the std::ostream does. Not sure if raw_ostream does any
state management, but when they do it causes bugs later for code that doesn't
always explicitly set the format, width, precision and much more:
std::ostream &os;
os << std::hex << hex_num;
dump(os);
os << hex_num2;
little did we know the stream was modified by "dump()" to set the format to
decimal so now you must always use:
os << std::hex << hex_num2;
(repeat for number width, precision and many other factors).
_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev