As I just responded to Masklinn, this is ambiguous. How do you lex `do 
R{foo()}`?

-Kevin

On Sep 19, 2013, at 2:41 PM, Martin DeMello <[email protected]> wrote:

> Yes, I figured R followed by a non-alphabetical character could serve
> the same purpose as ruby's %<char>.
> 
> martin
> 
> On Thu, Sep 19, 2013 at 2:37 PM, Kevin Ballard <[email protected]> wrote:
>> I didn't look at Ruby's syntax, but what you just described sounds a little 
>> too free-form to me. I believe Ruby at least requires a % as part of the 
>> syntax, e.g. %q{test}. But I don't think %R{test} is a good idea for rust, 
>> as it would conflict with the % operator. I don't think other punctuation 
>> would work well either.
>> 
>> -Kevin
>> 
>> On Sep 19, 2013, at 2:10 PM, Martin DeMello <[email protected]> wrote:
>> 
>>> How complicated would it be to use R"" but with arbitrary paired
>>> delimiters (the way, for instance, ruby does it)? It's very handy to
>>> pick a delimiter you know does not appear in the string, e.g. if you
>>> had a string containing ')' you could use R{this is a string with a )
>>> in it} or R|this is a string with a ) in it|.
>>> 
>>> martin
>>> 
>>> On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard <[email protected]> wrote:
>>>> One feature common to many programming languages that Rust lacks is "raw" 
>>>> string literals. Specifically, these are string literals that don't 
>>>> interpret backslash-escapes. There are three obvious applications at the 
>>>> moment: regular expressions, windows file paths, and format!() strings 
>>>> that want to embed { and } chars. I'm sure there are more as well, such as 
>>>> large string literals that contain things like HTML text.
>>>> 
>>>> I took a look at 3 programming languages to see what solutions they had: 
>>>> D, C++11, and Python. I've reproduced their syntax below, plus one more 
>>>> custom syntax, along with pros & cons. I'm hoping we can come up with a 
>>>> syntax that makes sense for Rust.
>>>> 
>>>> ## Python syntax:
>>>> 
>>>> Python supports an "r" or "R" prefix on any string literal (both "short" 
>>>> strings, delimited with a single quote, or "long" strings, delimited with 
>>>> 3 quotes). The "r" or "R" prefix denotes a "raw string", and has the 
>>>> effect of disabling backslash-escapes within the string. For the most 
>>>> part. It actually gets a bit weird: if a sequence of backslashes of an odd 
>>>> length occurs prior to a quote (of the appropriate quote type for the 
>>>> string), then the quote is considered to be escaped, but the backslashes 
>>>> are left in the string. This means r"foo\"" evaluates to the string 
>>>> `foo\"`, and similarly r"foo\\\"" is `foo\\\"`, but r"foo\\" is merely the 
>>>> string `foo\\`.
>>>> 
>>>> Pros:
>>>> * Simple syntax
>>>> * Allows for embedding the closing quote character in the raw string
>>>> 
>>>> Cons:
>>>> * Handling of backslashes is very bizarre, and the closing quote character 
>>>> can only be embedded if you want to have a backslash before it.
>>>> 
>>>> ## C++11 syntax:
>>>> 
>>>> C++11 allows for raw strings using a sequence of the form R"seq(raw 
>>>> text)seq". In this construct, `seq` is any sequence of (zero or more) 
>>>> characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form 
>>>> looks like R"(raw text)", which allows for anything in the raw text except 
>>>> for the sequence `)"`. The addition of the delimiter sequence allows for 
>>>> constructing a raw string containing any sequence at all (as the delimiter 
>>>> sequence can be adjusted based on the represented text).
>>>> 
>>>> Pros:
>>>> * Allows for embedding any character at all (representable in the source 
>>>> file encoding), including the closing quote.
>>>> * Reasonably straightforward
>>>> 
>>>> Cons:
>>>> * Syntax is slightly complicated
>>>> 
>>>> ## D syntax:
>>>> 
>>>> D supports three different forms of raw strings. The first two are 
>>>> similar, being r"raw text" and `raw text`. Besides the choice of 
>>>> delimiters, they behave identically, in that the raw text may contain 
>>>> anything except for the appropriate quote character. The third syntax is a 
>>>> slightly more complicated form of C++11's syntax, and is called a 
>>>> delimited string. It takes two forms.
>>>> 
>>>> The first looks like q"(raw text)" where the ( may be any non-identifier 
>>>> non-whitespace character. If the character is one of [(<{ then it is a 
>>>> "nesting delimiter", and the close delimiter must be the matching ])>} 
>>>> character, otherwise the close delimiter is the same as the open. 
>>>> Furthermore, nesting delimiters do exactly what their name says: they 
>>>> nest. If the nesting delimiter is (), then any ( in the raw text must be 
>>>> balanced with a ) in the raw text. In other words, q"(foo(bar))" evaluates 
>>>> to "foo(bar)", but q"(foo(bar)" and q"(foobar))" are both illegal.
>>>> 
>>>> The second uses any identifier as the delimiter. In this case, the 
>>>> identifier must immediately be followed by a newline, and in order to 
>>>> close the string, the close delimiter must be preceded by a newline. This 
>>>> looks like
>>>> 
>>>> q"delim
>>>> this is some raw text
>>>> delim"
>>>> 
>>>> It's essentially a heredoc. Note that the first newline is not part of the 
>>>> string, but the final newline is, so this evaluates to "this is some raw 
>>>> text\n".
>>>> 
>>>> Pros:
>>>> * Flexible
>>>> * Allows for constructing a raw string that contains any desired sequence 
>>>> of characters (representable in the source file's encoding)
>>>> 
>>>> Cons:
>>>> * Overly complicated
>>>> 
>>>> ## Custom syntax
>>>> 
>>>> There's another approach that none of these three languages take, which is 
>>>> to merely allow for doubling up the quote character in order to embed a 
>>>> quote. This would look like R"raw string literal ""with embedded 
>>>> quotes"".", which becomes `raw string literal "with embedded quotes"`.
>>>> 
>>>> Pros:
>>>> * Very simple
>>>> * Allows for embedding the close quote character, and therefore, any 
>>>> character (representable in the source file encoding)
>>>> 
>>>> Cons:
>>>> * Slightly odd to read
>>>> 
>>>> ## Conclusion
>>>> 
>>>> Of the three existing syntaxes examined here, I think C++11's is the best. 
>>>> It ties with D's syntax for being the most powerful, but is simpler than 
>>>> D's. The custom syntax is just as powerful though. The benefit of the 
>>>> C++11 syntax over the custom syntax is it's slightly easier to read the 
>>>> C++11 syntax, as the raw text has a 1-to-one mapping with the resulting 
>>>> string. The custom syntax is a bit more confusing to read, especially if 
>>>> you want to add multiple quotes. As a pathological case, let's try 
>>>> representing a Python triple-quoted docstring using both syntaxes:
>>>> 
>>>> C++11: R"("""this is a python docstring""")"
>>>> Custom: R"""""""this is a python docstring"""""""
>>>> 
>>>> Based on this examination, I'm leaning towards saying Rust should support 
>>>> C++11's raw string literal syntax.
>>>> 
>>>> I welcome any comments, criticisms, or suggestions.
>>>> 
>>>> -Kevin
>>>> _______________________________________________
>>>> Rust-dev mailing list
>>>> [email protected]
>>>> https://mail.mozilla.org/listinfo/rust-dev
>> 

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to