I am trying to understand the differences in the way the 'split' function works between Perl5 and Perl6.

Consider this string:

#####
$str = q|This is a string to be split|;
#####

Let's suppose I wish to split this string on the multi-character delimiter string 'tri'. The results are the same in both languages.

#####
# Case 1
$ perl -e 'my ($str, @rv);$str = q|This is a string to be split|; @rv = split(q|tri|, $str); print "<$_>" for @rv; print "\n";'
<This is a s><ng to be split>
#####
# Case 2
$ perl6 -e 'my ($str, @rv);$str = q|This is a string to be split|; @rv = split(q|tri|, $str); print "<$_>" for @rv; print "\n";'
<This is a s><ng to be split>
#####

Now let's suppose that in Perl5 I wish to split the string on a pattern which is the character class /[tri]/. I get:

#####
# Case 3
$ perl -e 'my ($str, @rv);$str = q|This is a string to be split|; @rv = split(/[tri]/, $str); print "<$_>" for @rv; print "\n";'
<Th><s ><s a s><><><ng ><o be spl>
#####

The result is a list of strings which do not contain any of 't', 'r' or 'i'. Where two of the delimiters occurred consecutively in the original string, I get an empty string -- except that empty strings at the end of the list are dropped.

Now let's run the same code in Perl6:

#####
# Case 4
$ perl6 -e 'my ($str, @rv);$str = q|This is a string to be split|; @rv = split(/[tri]/, $str); print "<$_>" for @rv; print "\n";'
<This is a s><ng to be split>
#####

I'm surprised to get exactly the same output I got in both languages when my delimiter was the multi-character string 'tri'. The '[' and ']' characters do not seem to indicate "character class" at all. It's as if '/[...]/' magically turns into 'q|...|'. What am I not grasping here?

One more case: When, in Perl6, I surround the brackets with angle brackets, I get somewhat more expected behavior:

#####
# Case 5
$ perl6 -e 'my ($str, @rv);$str = q|This is a string to be split|; @rv = split(/<[tri]>/, $str); print "<$_>" for @rv; print "\n";'
<Th><s ><s a s><><><ng ><o be spl><><>
#####

I get something very similar to Case 3, which was written in Perl5, viz., a list of strings which do not contain any of 't', 'r' or 'i'. Where two of the delimiters occurred consecutively in the original string, I get an empty string -- including at the end of the original string. So, does that mean that, in Perl6, to split a string on a character class, I have to always indicate (via the angle brackets) that the character class is a list?

Thank you very much.
Jim Keenan

Reply via email to