I am trying to understand the differences in the way the 'split'
function works between Perl5 and Perl6.
Consider this string:
#####
$str = q|This is a string to be split|;
#####
Let's suppose I wish to split this string on the multi-character
delimiter string 'tri'. The results are the same in both languages.
#####
# Case 1
$ perl -e 'my ($str, @rv);$str = q|This is a string to be split|; @rv =
split(q|tri|, $str); print "<$_>" for @rv; print "\n";'
<This is a s><ng to be split>
#####
# Case 2
$ perl6 -e 'my ($str, @rv);$str = q|This is a string to be split|; @rv =
split(q|tri|, $str); print "<$_>" for @rv; print "\n";'
<This is a s><ng to be split>
#####
Now let's suppose that in Perl5 I wish to split the string on a pattern
which is the character class /[tri]/. I get:
#####
# Case 3
$ perl -e 'my ($str, @rv);$str = q|This is a string to be split|; @rv =
split(/[tri]/, $str); print "<$_>" for @rv; print "\n";'
<Th><s ><s a s><><><ng ><o be spl>
#####
The result is a list of strings which do not contain any of 't', 'r' or
'i'. Where two of the delimiters occurred consecutively in the original
string, I get an empty string -- except that empty strings at the end of
the list are dropped.
Now let's run the same code in Perl6:
#####
# Case 4
$ perl6 -e 'my ($str, @rv);$str = q|This is a string to be split|; @rv =
split(/[tri]/, $str); print "<$_>" for @rv; print "\n";'
<This is a s><ng to be split>
#####
I'm surprised to get exactly the same output I got in both languages
when my delimiter was the multi-character string 'tri'. The '[' and ']'
characters do not seem to indicate "character class" at all. It's as if
'/[...]/' magically turns into 'q|...|'. What am I not grasping here?
One more case: When, in Perl6, I surround the brackets with angle
brackets, I get somewhat more expected behavior:
#####
# Case 5
$ perl6 -e 'my ($str, @rv);$str = q|This is a string to be split|; @rv =
split(/<[tri]>/, $str); print "<$_>" for @rv; print "\n";'
<Th><s ><s a s><><><ng ><o be spl><><>
#####
I get something very similar to Case 3, which was written in Perl5,
viz., a list of strings which do not contain any of 't', 'r' or 'i'.
Where two of the delimiters occurred consecutively in the original
string, I get an empty string -- including at the end of the original
string. So, does that mean that, in Perl6, to split a string on a
character class, I have to always indicate (via the angle brackets) that
the character class is a list?
Thank you very much.
Jim Keenan