To clarify, you may assume that lines in string are separated by
"\n" but any solution must pass the following edge cases:
1) empty string: @lines should contain zero elements
2) string of "\n" : @lines should contain one empty element
3) trailing empty lines should be retained
4) you may not assume that string is properly newline-terminated
For cheap thrills, I benchmarked some solutions that pass all
the edge cases.
use strict;
use Benchmark;
my $x = <<'FLAMING_OSTRICHES';
This is first test line
This is 2nd
And 3rd
FLAMING_OSTRICHES
sub a1 { my @lines = split(/^/, $x, -1); chomp(@lines) }
sub a2 { my @lines = $x eq "" ? () : $x =~ /^.*/mg }
sub j1 { my @lines = map { chomp; $_ } split /^/, $x, -1 }
# w1 is Perl 5.8.0 only
sub w1 { open(my $fh, "<", \$x); my @lines = <$fh>; chomp(@lines) }
timethese(600000, {
'a1' => \&a1,
'a2' => \&a2,
'j1' => \&j1,
'w1' => \&w1,
});
Results on Linux, Perl 5.8.0:
a1: 27 wallclock secs (15.06 usr + 0.01 sys = 15.07 CPU)
a2: 42 wallclock secs (24.06 usr + 0.04 sys = 24.10 CPU)
j1: 49 wallclock secs (27.84 usr + 0.04 sys = 27.88 CPU)
w1: 101 wallclock secs (62.74 usr + 0.04 sys = 62.78 CPU)
Why is a1 fastest? Not sure, but I noticed in the Camel re split:
"the patterns /\s+/, /^/ and / / are specially optimized".
BTW, an interesting technique, described at:
http://www.ccl4.org/~nick/P/Fast_Enough/
is to examine the ops. For example:
perl -MO=Terse -e'my @lines = split(/^/, $x, -1); chomp(@lines)'
perl -MO=Terse -e'my @lines = map { chomp; $_ } split /^/, $x, -1;'
/-\
http://greetings.yahoo.com.au - Yahoo! Greetings
- Send some online love this Valentine's Day.