I think designing good traits to support all these text implementations is
far more important than whatever hungarian notation is preferred for
literals.


Kevin


On Thu, Sep 19, 2013 at 2:50 PM, Martin DeMello <[email protected]>wrote:

> Ah, good point. You could fix it by having a very small whitelist of
> acceptable delimiters, but that probably takes it into overcomplex
> territory.
>
> martin
>
> On Thu, Sep 19, 2013 at 2:46 PM, Kevin Ballard <[email protected]> wrote:
> > As I just responded to Masklinn, this is ambiguous. How do you lex `do
> R{foo()}`?
> >
> > -Kevin
> >
> > On Sep 19, 2013, at 2:41 PM, Martin DeMello <[email protected]>
> wrote:
> >
> >> Yes, I figured R followed by a non-alphabetical character could serve
> >> the same purpose as ruby's %<char>.
> >>
> >> martin
> >>
> >> On Thu, Sep 19, 2013 at 2:37 PM, Kevin Ballard <[email protected]> wrote:
> >>> I didn't look at Ruby's syntax, but what you just described sounds a
> little too free-form to me. I believe Ruby at least requires a % as part of
> the syntax, e.g. %q{test}. But I don't think %R{test} is a good idea for
> rust, as it would conflict with the % operator. I don't think other
> punctuation would work well either.
> >>>
> >>> -Kevin
> >>>
> >>> On Sep 19, 2013, at 2:10 PM, Martin DeMello <[email protected]>
> wrote:
> >>>
> >>>> How complicated would it be to use R"" but with arbitrary paired
> >>>> delimiters (the way, for instance, ruby does it)? It's very handy to
> >>>> pick a delimiter you know does not appear in the string, e.g. if you
> >>>> had a string containing ')' you could use R{this is a string with a )
> >>>> in it} or R|this is a string with a ) in it|.
> >>>>
> >>>> martin
> >>>>
> >>>> On Thu, Sep 19, 2013 at 1:36 PM, Kevin Ballard <[email protected]> wrote:
> >>>>> One feature common to many programming languages that Rust lacks is
> "raw" string literals. Specifically, these are string literals that don't
> interpret backslash-escapes. There are three obvious applications at the
> moment: regular expressions, windows file paths, and format!() strings that
> want to embed { and } chars. I'm sure there are more as well, such as large
> string literals that contain things like HTML text.
> >>>>>
> >>>>> I took a look at 3 programming languages to see what solutions they
> had: D, C++11, and Python. I've reproduced their syntax below, plus one
> more custom syntax, along with pros & cons. I'm hoping we can come up with
> a syntax that makes sense for Rust.
> >>>>>
> >>>>> ## Python syntax:
> >>>>>
> >>>>> Python supports an "r" or "R" prefix on any string literal (both
> "short" strings, delimited with a single quote, or "long" strings,
> delimited with 3 quotes). The "r" or "R" prefix denotes a "raw string", and
> has the effect of disabling backslash-escapes within the string. For the
> most part. It actually gets a bit weird: if a sequence of backslashes of an
> odd length occurs prior to a quote (of the appropriate quote type for the
> string), then the quote is considered to be escaped, but the backslashes
> are left in the string. This means r"foo\"" evaluates to the string
> `foo\"`, and similarly r"foo\\\"" is `foo\\\"`, but r"foo\\" is merely the
> string `foo\\`.
> >>>>>
> >>>>> Pros:
> >>>>> * Simple syntax
> >>>>> * Allows for embedding the closing quote character in the raw string
> >>>>>
> >>>>> Cons:
> >>>>> * Handling of backslashes is very bizarre, and the closing quote
> character can only be embedded if you want to have a backslash before it.
> >>>>>
> >>>>> ## C++11 syntax:
> >>>>>
> >>>>> C++11 allows for raw strings using a sequence of the form R"seq(raw
> text)seq". In this construct, `seq` is any sequence of (zero or more)
> characters except for: space, (, ), \, \t, \v, \n, \r. The simplest form
> looks like R"(raw text)", which allows for anything in the raw text except
> for the sequence `)"`. The addition of the delimiter sequence allows for
> constructing a raw string containing any sequence at all (as the delimiter
> sequence can be adjusted based on the represented text).
> >>>>>
> >>>>> Pros:
> >>>>> * Allows for embedding any character at all (representable in the
> source file encoding), including the closing quote.
> >>>>> * Reasonably straightforward
> >>>>>
> >>>>> Cons:
> >>>>> * Syntax is slightly complicated
> >>>>>
> >>>>> ## D syntax:
> >>>>>
> >>>>> D supports three different forms of raw strings. The first two are
> similar, being r"raw text" and `raw text`. Besides the choice of
> delimiters, they behave identically, in that the raw text may contain
> anything except for the appropriate quote character. The third syntax is a
> slightly more complicated form of C++11's syntax, and is called a delimited
> string. It takes two forms.
> >>>>>
> >>>>> The first looks like q"(raw text)" where the ( may be any
> non-identifier non-whitespace character. If the character is one of [(<{
> then it is a "nesting delimiter", and the close delimiter must be the
> matching ])>} character, otherwise the close delimiter is the same as the
> open. Furthermore, nesting delimiters do exactly what their name says: they
> nest. If the nesting delimiter is (), then any ( in the raw text must be
> balanced with a ) in the raw text. In other words, q"(foo(bar))" evaluates
> to "foo(bar)", but q"(foo(bar)" and q"(foobar))" are both illegal.
> >>>>>
> >>>>> The second uses any identifier as the delimiter. In this case, the
> identifier must immediately be followed by a newline, and in order to close
> the string, the close delimiter must be preceded by a newline. This looks
> like
> >>>>>
> >>>>> q"delim
> >>>>> this is some raw text
> >>>>> delim"
> >>>>>
> >>>>> It's essentially a heredoc. Note that the first newline is not part
> of the string, but the final newline is, so this evaluates to "this is some
> raw text\n".
> >>>>>
> >>>>> Pros:
> >>>>> * Flexible
> >>>>> * Allows for constructing a raw string that contains any desired
> sequence of characters (representable in the source file's encoding)
> >>>>>
> >>>>> Cons:
> >>>>> * Overly complicated
> >>>>>
> >>>>> ## Custom syntax
> >>>>>
> >>>>> There's another approach that none of these three languages take,
> which is to merely allow for doubling up the quote character in order to
> embed a quote. This would look like R"raw string literal ""with embedded
> quotes"".", which becomes `raw string literal "with embedded quotes"`.
> >>>>>
> >>>>> Pros:
> >>>>> * Very simple
> >>>>> * Allows for embedding the close quote character, and therefore, any
> character (representable in the source file encoding)
> >>>>>
> >>>>> Cons:
> >>>>> * Slightly odd to read
> >>>>>
> >>>>> ## Conclusion
> >>>>>
> >>>>> Of the three existing syntaxes examined here, I think C++11's is the
> best. It ties with D's syntax for being the most powerful, but is simpler
> than D's. The custom syntax is just as powerful though. The benefit of the
> C++11 syntax over the custom syntax is it's slightly easier to read the
> C++11 syntax, as the raw text has a 1-to-one mapping with the resulting
> string. The custom syntax is a bit more confusing to read, especially if
> you want to add multiple quotes. As a pathological case, let's try
> representing a Python triple-quoted docstring using both syntaxes:
> >>>>>
> >>>>> C++11: R"("""this is a python docstring""")"
> >>>>> Custom: R"""""""this is a python docstring"""""""
> >>>>>
> >>>>> Based on this examination, I'm leaning towards saying Rust should
> support C++11's raw string literal syntax.
> >>>>>
> >>>>> I welcome any comments, criticisms, or suggestions.
> >>>>>
> >>>>> -Kevin
> >>>>> _______________________________________________
> >>>>> Rust-dev mailing list
> >>>>> [email protected]
> >>>>> https://mail.mozilla.org/listinfo/rust-dev
> >>>
> >
> _______________________________________________
> Rust-dev mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/rust-dev
>
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to