Yes, I figured R followed by a non-alphabetical character could serve the same purpose as ruby's %<char>.
martin On Thu, Sep 19, 2013 at 2:37 PM, Kevin Ballard <[email protected]> wrote: > I didn't look at Ruby's syntax, but what you just described sounds a little > too free-form to me. I believe Ruby at least requires a % as part of the > syntax, e.g. %q{test}. But I don't think %R{test} is a good idea for rust, as > it would conflict with the % operator. I don't think other punctuation would > work well either. > > -Kevin > > On Sep 19, 2013, at 2:10 PM, Martin DeMello <[email protected]> wrote: > >> How complicated would it be to use R"" but with arbitrary paired >> delimiters (the way, for instance, ruby does it)? It's very handy to >> pick a delimiter you know does not appear in the string, e.g. if you >> had a string containing ')' you could use R{this is a string with a ) >> in it} or R|this is a string with a ) in it|. >> >> martin >> >> On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard <[email protected]> wrote: >>> One feature common to many programming languages that Rust lacks is "raw" >>> string literals. Specifically, these are string literals that don't >>> interpret backslash-escapes. There are three obvious applications at the >>> moment: regular expressions, windows file paths, and format!() strings that >>> want to embed { and } chars. I'm sure there are more as well, such as large >>> string literals that contain things like HTML text. >>> >>> I took a look at 3 programming languages to see what solutions they had: D, >>> C++11, and Python. I've reproduced their syntax below, plus one more custom >>> syntax, along with pros & cons. I'm hoping we can come up with a syntax >>> that makes sense for Rust. >>> >>> ## Python syntax: >>> >>> Python supports an "r" or "R" prefix on any string literal (both "short" >>> strings, delimited with a single quote, or "long" strings, delimited with 3 >>> quotes). The "r" or "R" prefix denotes a "raw string", and has the effect >>> of disabling backslash-escapes within the string. For the most part. It >>> actually gets a bit weird: if a sequence of backslashes of an odd length >>> occurs prior to a quote (of the appropriate quote type for the string), >>> then the quote is considered to be escaped, but the backslashes are left in >>> the string. This means r"foo\"" evaluates to the string `foo\"`, and >>> similarly r"foo\\\"" is `foo\\\"`, but r"foo\\" is merely the string >>> `foo\\`. >>> >>> Pros: >>> * Simple syntax >>> * Allows for embedding the closing quote character in the raw string >>> >>> Cons: >>> * Handling of backslashes is very bizarre, and the closing quote character >>> can only be embedded if you want to have a backslash before it. >>> >>> ## C++11 syntax: >>> >>> C++11 allows for raw strings using a sequence of the form R"seq(raw >>> text)seq". In this construct, `seq` is any sequence of (zero or more) >>> characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form >>> looks like R"(raw text)", which allows for anything in the raw text except >>> for the sequence `)"`. The addition of the delimiter sequence allows for >>> constructing a raw string containing any sequence at all (as the delimiter >>> sequence can be adjusted based on the represented text). >>> >>> Pros: >>> * Allows for embedding any character at all (representable in the source >>> file encoding), including the closing quote. >>> * Reasonably straightforward >>> >>> Cons: >>> * Syntax is slightly complicated >>> >>> ## D syntax: >>> >>> D supports three different forms of raw strings. The first two are similar, >>> being r"raw text" and `raw text`. Besides the choice of delimiters, they >>> behave identically, in that the raw text may contain anything except for >>> the appropriate quote character. The third syntax is a slightly more >>> complicated form of C++11's syntax, and is called a delimited string. It >>> takes two forms. >>> >>> The first looks like q"(raw text)" where the ( may be any non-identifier >>> non-whitespace character. If the character is one of [(<{ then it is a >>> "nesting delimiter", and the close delimiter must be the matching ])>} >>> character, otherwise the close delimiter is the same as the open. >>> Furthermore, nesting delimiters do exactly what their name says: they nest. >>> If the nesting delimiter is (), then any ( in the raw text must be balanced >>> with a ) in the raw text. In other words, q"(foo(bar))" evaluates to >>> "foo(bar)", but q"(foo(bar)" and q"(foobar))" are both illegal. >>> >>> The second uses any identifier as the delimiter. In this case, the >>> identifier must immediately be followed by a newline, and in order to close >>> the string, the close delimiter must be preceded by a newline. This looks >>> like >>> >>> q"delim >>> this is some raw text >>> delim" >>> >>> It's essentially a heredoc. Note that the first newline is not part of the >>> string, but the final newline is, so this evaluates to "this is some raw >>> text\n". >>> >>> Pros: >>> * Flexible >>> * Allows for constructing a raw string that contains any desired sequence >>> of characters (representable in the source file's encoding) >>> >>> Cons: >>> * Overly complicated >>> >>> ## Custom syntax >>> >>> There's another approach that none of these three languages take, which is >>> to merely allow for doubling up the quote character in order to embed a >>> quote. This would look like R"raw string literal ""with embedded >>> quotes"".", which becomes `raw string literal "with embedded quotes"`. >>> >>> Pros: >>> * Very simple >>> * Allows for embedding the close quote character, and therefore, any >>> character (representable in the source file encoding) >>> >>> Cons: >>> * Slightly odd to read >>> >>> ## Conclusion >>> >>> Of the three existing syntaxes examined here, I think C++11's is the best. >>> It ties with D's syntax for being the most powerful, but is simpler than >>> D's. The custom syntax is just as powerful though. The benefit of the C++11 >>> syntax over the custom syntax is it's slightly easier to read the C++11 >>> syntax, as the raw text has a 1-to-one mapping with the resulting string. >>> The custom syntax is a bit more confusing to read, especially if you want >>> to add multiple quotes. As a pathological case, let's try representing a >>> Python triple-quoted docstring using both syntaxes: >>> >>> C++11: R"("""this is a python docstring""")" >>> Custom: R"""""""this is a python docstring""""""" >>> >>> Based on this examination, I'm leaning towards saying Rust should support >>> C++11's raw string literal syntax. >>> >>> I welcome any comments, criticisms, or suggestions. >>> >>> -Kevin >>> _______________________________________________ >>> Rust-dev mailing list >>> [email protected] >>> https://mail.mozilla.org/listinfo/rust-dev > _______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
