Re: String Literals, take 2

James Mastros Mon, 02 Dec 2002 09:19:54 -0800

Just a few more nits to pick...

On 12/02/2002 6:58 AM, Joseph F. Ryan wrote:

The q() operator allows strings to be made with
any non-space, non-letter, non-digit character as the delimeter instead
of '.  In addition, if the starting delimeter is a part of a paired
set, such as (, [, <, or {, then the closing delimeter may be the
matching member of the set.  In addition, the reverse holds true;
delimeters which are the tail end of a pair may use the starting item
as the closing delimeter.

We need to decide if this is a user doc or a developer doc/language specification. If it's the later, we need a regirous defintion of what a pair is.

There are a few special cases for delimeters; specifically : and #.
: is not allowed because it might be used by custom-defined quoting
operators to apply a property; # is allowed, but there cannot be a
space between the operator and the #.  In addition, comments are not
allowed within # delimeted expressions (for obvious reasons).

Are comments ever allowed within q() constructs? If not, ditch the statement about comments not being allowed in q## constructs.

=head3 <<>>; expanding a string as a list.


A set of braces is a special op that evaluates into the list of word

A doubled set of angle brackets (<<text here>>) or a set of double-angle quotation marks (guillemets, Ťtext hereť).

contained, using whitespace as the delimeter.  It is similar to qw()
from perl5, and can be thought of as roughly equivalent to:

Are we getting rid of qw()? I assumed that we were keeping it as a longhand form of <<>>/guillemets, just like qq() is the longhand form of "".

C<< "STRING".split(' ') >>

I'd be more explicit here, and say C<<"STRING".split(/\s+/)>>. (The two are equivlent, but only because of special-casing; the second is more explicit.)

=head2 Interpolating Constructs

Interpolating constructs are another form of string in which variables
that are embedded into the string are expanded into their value at
runtime.  Interpolated strings are formed using the double quote:

...using double quotes, as in "string".

"string". In addition, qq() is a synonym for "", which is similar to
q() being a synoynm for ''.

...similarly to...

=item Hashes: C<"%hash">, C<"%(expression)">
Hashes interpolate by joining its pairs on its .separator property,
which by default is a newline.  Pairs stringify by joining the key and
value with the hash's .pairsep property, which by default is a space.

Have these defaults been defined somewhere? I'd rather see them be ', ' and '=>' by default...

Note that hashes are unordered, and so the output will be unordered.
Therefore, the following two expressions are equivalant:

Get rid of the therefore; it seems to refer to the preceding sentance, which has nothing to do with the example.

=item Subroutines and Methods: C<"&sub($a1,$a2)">, C<"$obj.meth($a)">
Subroutines and Methods will interpolate their return value into the
string, which will be handled in whichever type the return value is.
Same for object methods.  Note that parens B<are> required during
interpolation so that the parser can disambiguate between object
methods and object members.

Has this been vetted? $(...)/etc seem to cover this case, and & being a qq() metachar makes using qq() strings to print HTML/XML difficult.

=item Escaped Characters
# Basically the same as Perl5; also, how are locale semantics handled?

   \t            tab
   \n            newline
   \r            return
   \f            form feed
   \b            backspace
   \a            alarm (bell)
   \e            escape

Can we get some riggor here? Also, is \n the same everwhere, or do we play the same tricks we did with it in p5? (I think it should be the same everywhere, a CR char, "\cM". Disciplines, or encodings, or whatever we're calling them, can take care of it on IO.) Oh, and it might be nice for \0 to be NUL. (This used to be implicit with \0 as octal, but since \0 isn't octal anymore...)

   \b10        binary char
   \o33        octal char

Numeric Literals, take 3 (http:[EMAIL PROTECTED]/msg00462.html), in the "*** Bin/Hex/Oct shorthands" section, gives 0c123 as the shorthand form of octal numbers, so it doesn't make much sense for octal character constants to be \o123. Do we want to change shorthand octal literal numbers to 0o123 (I don't like this, it's hard to read), change octal chars to \c123 (can't do this without getting rid of, or changing, \c for control-character), get rid of octal chars entirely, or somthing else? (Baring a good "somthing else", I vote for killing octal chars.)

   \x1b        hex char

Exactly two digits after the \x? Perl5 attempts to do the right thing either way, but this can be confusing too -- "\xA" eq chr(0xA), "\xABar" eq chr(0xAB)."ar", "\xAQux" eq chr(0xA)."Qux".

   \x{263a}    wide hex char
   \c[            control char

Rigor? What is \c~? perl5 thinks it's >, should perl6 agree? How about \c\x{1000} (that's invalid, but you get the point), is that equiv to \x{ff9c}? What about \cé, (e+acute accent), does that capitalize, then subtract 64, or just subtract?

   \N{name}    named Unicode character

Reference to charnames pragmata, or however we end up defining the exact semantics of \N. (Since we don't know yet, just put in a FIXME, I suppose.)

Is there any way to give the ordnal in decimal, like "\d192"? (I'm not sure how useful this would be, but it would be nice parrellelisim. OTOH, you can use chr() easily enough.

=item Modifiers: C<\Q{}>, C<\L{}>, C<\U{}>

Modifiers apply a modification to text which they enclose; they can be
embedded within interpolated strings.

   \L{}        Lowercase all characters within brackets
   \U{}        Uppercase all characters within brackets
   \Q{}        Escape all characters that need escaping
               within brackets (except "}")

Rigor: escape all non-alphanumerics.
Do we still have the other modifiers that p5 supports, \l and \u? Do we want a new titlecase modifier, \T{james mastros} eq "James Mastros", doing the Right Thing for other languages, where it isn't so simple (there are complicated cases for this, but IIRC Unicode defines a robust algo to do this). I'll check on the Unicode stuff if anybody thinks it's a good idea... I'm uncertian, myself, I never liked the qq() case-modifers, so don't use them.

A string which is (possibly) interpolated and then executed as a system
command with /bin/sh or its equivalent.   Shell wildcards, pipes, and
redirections will be honored. The collected standard output of the
command is returned; standard error is unaffected. In scalar context,
it comes back as a single (potentially multi-line) string, or undef if
the command failed. In list context, returns a of list of lines split
on the standard input separator, or an empty list if the command
failed.

This whole section is very unix-centric, but I'm not certian what to do about that -- the functionality is very system-specifc. Also, I suspect we're going to want to rewrite it anyway when we hammer out iterators, files, and context.

A line-oriented form of quoting is based on the shell "here-document"

s/shell/unix borne shell/

syntax.  Following a << you specify a string to terminate the quoted
material, and all lines following the current line down to the
terminating string are the value of the item. The terminating string
may be either an identifier (a word), or some quoted text. If quoted,
the type of quotes you use determines the treatment of the text, just
as in regular quoting. An unquoted identifier works like double quotes.
The terminating string must appear by itself, and any preceding or
following whitespace on the terminating line is discarded.

I could have sworn that Larry recently put somthing out about the edge cases between << heredoc and << beginning-of-qw. I /think/ he said that qw("Foo" bar) must be written as << "Foo" bar>>, because otherwise it would be interpreted as a here-doc ending with Foo with double-quote interpolation. Can anybody find this, or is Larry watching?

Also note that with single quoted here-docs, backslashes are not
special, and are taken for a literal backslash, a behaivor that is
different from normal single-quoted strings.

Are \qq()s still special, even in <<'noninterpolating's? Either way, it should be explicitly noted.

V-Strings are formed when 3 or digits are joined by decimal points,
with a possible leading v.  The resulting item is then treated like
a string, rather than a number.

=over 3
Examples:
 $var = v5.8.0; # $var = "5.8.0";
 $var = 192.168.0.1; # $var = "192.168.0.1";
=back

Note that the v is non-optional for two-character v-strings.

I'd say somthing like:
V-strings are actualy strings that just happen to look like numbers. Each dot-sepperated number is transformed into the character with that Unicode ordnal, and the string is concotantaed together.

(The transformation from normal string to v-string looks like C<<$vstring='v' ~ join '.', map {ord} split //, $instring>>; the transformation from v-string to normal string looks like
C<<print join '', map {chr} split /\./, $vstring>>;
(Where vstring cannot begin with a leading 'v', for purposes of illistration.))

Thus, C<<80.101.114.108.32.54.33 eq 'Perl 6!'>>

Also, your examples are misleading at best. v5.8.0 eq "\x05\x08\x00".
192.168.0.1 eq chr(192)~chr(168)~chr(0)~chr(1).

-=- James Mastros

Re: String Literals, take 2

Reply via email to