I think designing good traits to support all these text implementations is far more important than whatever hungarian notation is preferred for literals.
Kevin On Thu, Sep 19, 2013 at 2:50 PM, Martin DeMello <[email protected]>wrote: > Ah, good point. You could fix it by having a very small whitelist of > acceptable delimiters, but that probably takes it into overcomplex > territory. > > martin > > On Thu, Sep 19, 2013 at 2:46 PM, Kevin Ballard <[email protected]> wrote: > > As I just responded to Masklinn, this is ambiguous. How do you lex `do > R{foo()}`? > > > > -Kevin > > > > On Sep 19, 2013, at 2:41 PM, Martin DeMello <[email protected]> > wrote: > > > >> Yes, I figured R followed by a non-alphabetical character could serve > >> the same purpose as ruby's %<char>. > >> > >> martin > >> > >> On Thu, Sep 19, 2013 at 2:37 PM, Kevin Ballard <[email protected]> wrote: > >>> I didn't look at Ruby's syntax, but what you just described sounds a > little too free-form to me. I believe Ruby at least requires a % as part of > the syntax, e.g. %q{test}. But I don't think %R{test} is a good idea for > rust, as it would conflict with the % operator. I don't think other > punctuation would work well either. > >>> > >>> -Kevin > >>> > >>> On Sep 19, 2013, at 2:10 PM, Martin DeMello <[email protected]> > wrote: > >>> > >>>> How complicated would it be to use R"" but with arbitrary paired > >>>> delimiters (the way, for instance, ruby does it)? It's very handy to > >>>> pick a delimiter you know does not appear in the string, e.g. if you > >>>> had a string containing ')' you could use R{this is a string with a ) > >>>> in it} or R|this is a string with a ) in it|. > >>>> > >>>> martin > >>>> > >>>> On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard <[email protected]> wrote: > >>>>> One feature common to many programming languages that Rust lacks is > "raw" string literals. Specifically, these are string literals that don't > interpret backslash-escapes. There are three obvious applications at the > moment: regular expressions, windows file paths, and format!() strings that > want to embed { and } chars. I'm sure there are more as well, such as large > string literals that contain things like HTML text. > >>>>> > >>>>> I took a look at 3 programming languages to see what solutions they > had: D, C++11, and Python. I've reproduced their syntax below, plus one > more custom syntax, along with pros & cons. I'm hoping we can come up with > a syntax that makes sense for Rust. > >>>>> > >>>>> ## Python syntax: > >>>>> > >>>>> Python supports an "r" or "R" prefix on any string literal (both > "short" strings, delimited with a single quote, or "long" strings, > delimited with 3 quotes). The "r" or "R" prefix denotes a "raw string", and > has the effect of disabling backslash-escapes within the string. For the > most part. It actually gets a bit weird: if a sequence of backslashes of an > odd length occurs prior to a quote (of the appropriate quote type for the > string), then the quote is considered to be escaped, but the backslashes > are left in the string. This means r"foo\"" evaluates to the string > `foo\"`, and similarly r"foo\\\"" is `foo\\\"`, but r"foo\\" is merely the > string `foo\\`. > >>>>> > >>>>> Pros: > >>>>> * Simple syntax > >>>>> * Allows for embedding the closing quote character in the raw string > >>>>> > >>>>> Cons: > >>>>> * Handling of backslashes is very bizarre, and the closing quote > character can only be embedded if you want to have a backslash before it. > >>>>> > >>>>> ## C++11 syntax: > >>>>> > >>>>> C++11 allows for raw strings using a sequence of the form R"seq(raw > text)seq". In this construct, `seq` is any sequence of (zero or more) > characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form > looks like R"(raw text)", which allows for anything in the raw text except > for the sequence `)"`. The addition of the delimiter sequence allows for > constructing a raw string containing any sequence at all (as the delimiter > sequence can be adjusted based on the represented text). > >>>>> > >>>>> Pros: > >>>>> * Allows for embedding any character at all (representable in the > source file encoding), including the closing quote. > >>>>> * Reasonably straightforward > >>>>> > >>>>> Cons: > >>>>> * Syntax is slightly complicated > >>>>> > >>>>> ## D syntax: > >>>>> > >>>>> D supports three different forms of raw strings. The first two are > similar, being r"raw text" and `raw text`. Besides the choice of > delimiters, they behave identically, in that the raw text may contain > anything except for the appropriate quote character. The third syntax is a > slightly more complicated form of C++11's syntax, and is called a delimited > string. It takes two forms. > >>>>> > >>>>> The first looks like q"(raw text)" where the ( may be any > non-identifier non-whitespace character. If the character is one of [(<{ > then it is a "nesting delimiter", and the close delimiter must be the > matching ])>} character, otherwise the close delimiter is the same as the > open. Furthermore, nesting delimiters do exactly what their name says: they > nest. If the nesting delimiter is (), then any ( in the raw text must be > balanced with a ) in the raw text. In other words, q"(foo(bar))" evaluates > to "foo(bar)", but q"(foo(bar)" and q"(foobar))" are both illegal. > >>>>> > >>>>> The second uses any identifier as the delimiter. In this case, the > identifier must immediately be followed by a newline, and in order to close > the string, the close delimiter must be preceded by a newline. This looks > like > >>>>> > >>>>> q"delim > >>>>> this is some raw text > >>>>> delim" > >>>>> > >>>>> It's essentially a heredoc. Note that the first newline is not part > of the string, but the final newline is, so this evaluates to "this is some > raw text\n". > >>>>> > >>>>> Pros: > >>>>> * Flexible > >>>>> * Allows for constructing a raw string that contains any desired > sequence of characters (representable in the source file's encoding) > >>>>> > >>>>> Cons: > >>>>> * Overly complicated > >>>>> > >>>>> ## Custom syntax > >>>>> > >>>>> There's another approach that none of these three languages take, > which is to merely allow for doubling up the quote character in order to > embed a quote. This would look like R"raw string literal ""with embedded > quotes"".", which becomes `raw string literal "with embedded quotes"`. > >>>>> > >>>>> Pros: > >>>>> * Very simple > >>>>> * Allows for embedding the close quote character, and therefore, any > character (representable in the source file encoding) > >>>>> > >>>>> Cons: > >>>>> * Slightly odd to read > >>>>> > >>>>> ## Conclusion > >>>>> > >>>>> Of the three existing syntaxes examined here, I think C++11's is the > best. It ties with D's syntax for being the most powerful, but is simpler > than D's. The custom syntax is just as powerful though. The benefit of the > C++11 syntax over the custom syntax is it's slightly easier to read the > C++11 syntax, as the raw text has a 1-to-one mapping with the resulting > string. The custom syntax is a bit more confusing to read, especially if > you want to add multiple quotes. As a pathological case, let's try > representing a Python triple-quoted docstring using both syntaxes: > >>>>> > >>>>> C++11: R"("""this is a python docstring""")" > >>>>> Custom: R"""""""this is a python docstring""""""" > >>>>> > >>>>> Based on this examination, I'm leaning towards saying Rust should > support C++11's raw string literal syntax. > >>>>> > >>>>> I welcome any comments, criticisms, or suggestions. > >>>>> > >>>>> -Kevin > >>>>> _______________________________________________ > >>>>> Rust-dev mailing list > >>>>> [email protected] > >>>>> https://mail.mozilla.org/listinfo/rust-dev > >>> > > > _______________________________________________ > Rust-dev mailing list > [email protected] > https://mail.mozilla.org/listinfo/rust-dev >
_______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
